# Descriptive and theoretical approaches to African linguistics

Selected papers from the 49th Annual Conference on African Linguistics

Edited by Galen Sibanda Deo Ngonyani Jonathan Choti Ann Biersteker

Contemporary African Linguistics 6

#### Contemporary African Linguistics

Editors: Akinbiyi Akinlabi, Laura J. Downing

In this series:


# Descriptive and theoretical approaches to African linguistics

Selected papers from the 49th Annual Conference on African Linguistics

Edited by Galen Sibanda Deo Ngonyani Jonathan Choti Ann Biersteker

Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.). 2022. *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics* (Contemporary African Linguistics 6). Berlin: Language Science Press. This title can be downloaded at: https://langsci-press.org/catalog/book/306 © 2022, the authors Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-96110-340-9 (Digital) 978-3-98554-036-5 (Hardcover)

ISSN: 2511-7726 DOI: 10.5281/zenodo.6358613 Source code available from www.github.com/langsci/306 Errata: paperhive.org/documents/remote?type=langsci&id=306

Cover and concept of design: Ulrike Harbort Proofreading: Abbie Hantgan, Adaobi Ngozi Okoye, Alexandr Rosen, Alexandru Craevschi, Amir Ghorbanpour, Brett Reynolds, Bruce Wiebe, Cecilia Eme, Christopher Straughn, Emuobonuvie M. Ajiboye, Jean Nitzke, Jenneke van der Wal, Jeroen van de Weijer, John-Patrick Doherty, Lachlan Mackenzie, Lauren Schneider, Liliane Hodieb, Lotta Aunio, Marten Stelling, Mohammad Younes, Samuel Atintono, Sandra Auderset, Sean Stalley, Steve Pepper, Tihomir Rangelov, Yvonne Treis Fonts: Libertinus, Arimo, DejaVu Sans Mono Typesetting software: XƎLATEX

Language Science Press xHain Grünberger Str. 16 10243 Berlin, Germany https://langsci-press.org Storage and cataloguing done by Zenodo

# **Contents**


#### Contents


# **Preface**

This volume contains a selection of revised peer-reviewed papers from the 49th Annual Conference on African Linguistics, held at Michigan State University in 2018. About 170 papers on both descriptive and theoretical aspects of African linguistics were presented at the conference. Areas of interest included phonetics, phonology, morphology, syntax, semantics, sociolinguistics, historical linguistics, discourse analysis, language documentation, computational linguistics and more. Presenters were both graduate students and more senior scholars based in North America, Africa and beyond. Their research covered all major regions of Africa and most language families found in the continent.

The volume editors would like to thank those who contributed to the success of the conference through their generous support and those who made the publication of this volume possible. These include the Association of Contemporary African Linguistics, the African Studies Center at Michigan State University (MSU) under the late Director James A. Pritchett and later under the stewardship of Jamie Monson, the MSU Department of Linguistics, Germanic, Slavic, Asian and African Languages (now Department of Linguistics, Languages, and Cultures) then chaired by Sonja Fritzche and later Jason Merrill, the College of Arts and Letters especially Dean Christopher P. Long, Associate Dean Sonja Fritzche and Bill Hart-Davidson (Associate Dean for Research and Graduate Studies), and MSU's Writing Center. Many thanks to Yen-Hwei Lin, the Chair of the Department of Linguistics, Languages, and Cultures for providing much-needed support. We would also like to express our gratitude to previous ACAL conference hosts who shared technical information with us, particularly UC Berkeley and Indiana University. Thanks also to our graduate and undergraduate student helpers, Foreign Language Teaching Assistants (FLTAs), tutors, administrative staff, and friends for their various roles in supporting the 49th ACAL Conference. Tanner Schudlich, Nicole McKenzie, Nellie Hunter, Lisa Hinds, Michel Burton, Ok-Sook Park, Brahim Chakrani, Salamatu Abdulkarreem, Morgan Momberg, Magdalyne Oguti, Idris Abubakar, Felix Umeana, Emily Skupin, Yan Cong, Rachel Stacey, Adam Smolinski, and Yuakai Chen all deserve special mention.

We also take this opportunity to thank many reviewers and proofreaders who took some time out of their busy schedules to help us select and improve the qual-

#### Preface

ity of the papers in this volume. Reviewers include Xiayimaierdan Abudushalamu, Gregory D. S. Anderson, Matthew Baerman, William G. Bennett, Lee Bickmore, Robert Botne, Kenyon Branan, Leston Buell, Mike Cahill, Yan Cong, Toni Cook, Katherine Demuth, Thabo Ditsele, Michael Diercks, Laura Downing, James Esegbey, Daniel Finer, Joash J. Gambarage, John Gluckman, John Goldsmith, Richard T. Griscom, Claire Halpert, Larry Hyman, Peter Jenks, Kyle Jerro, Jason Kandybowicz, Boniface Kawasha, Ettien Koffi, Ruth Kramer, Nancy C. Kula, Karsten Legère, Gastor C. Mapunda, Michael Marlo, Leonard Muaka, Laura McPherson, Steve Nicolle, David Odden, Kenneth S. Olson, Mary Paster, Doris Payne, Asia Pietraszko, Douglas Pulleyblank, Ronald Schaefer, Sharon Rose, Josephat Rugemarila, Ken Safir, Hannah Sande, Sylvester R. Simango, Jenneke van der Wal, Aggrey Wasike, Jochen Zeller and Patricia Schneider-Zioga.

The volume editors also express their gratitude to the Contemporary African Linguistics series editors Michael Marlo, Laura Downing, and Akinbiyi Akinlabi for providing support and direction. At Language Science Press, we greatly appreciate all the assistance we received and the quality of work from Sebastian Nordhoff, Felix Kopecky and other members of their team. In short the volume editors are grateful to those who provided the necessary support, presenters, invited speakers, conference audiences, contributors to this volume, publishers, and all those who worked behind the scenes.

# **Chapter 1**

# **Velar Tap in Dàgáárè**

Samuel Akinbo, Alexander Angsongna, Avery Ozburn, Murray Schellenberg & Douglas Pulleyblank

University of British Columbia

Bodomo (1997) describes intervocalic velar [ɡ] in Dàgáárè as fricative [ɣ]. With 42 tokens of intervocalic [ɡ] from a native speaker of Dàgáárè, we investigated the acoustic and articulatory features of the Dàgáárè intervocalic velar [ɡ] using ultrasound images, waveforms, spectrograms, and palatogram. The results of the study suggest that Dàgáárè intervocalic [ɡ] is not a fricative but a velar with strong tap-like features, a previously unattested sound in natural language (Ladefoged 1990). Following from this, we conclude that Dàgáárè intervocalic velar [ɡ] is not a fricative but a tap.

### **1 Introduction**

Dàgááré is a Gur language of the Niger-Congo family, part of a language group known as the Mabia languages. It is spoken by about 1.5 million people in northwestern Ghana and some parts of Burkina Faso (Kennedy 1966, Bodomo 1997).

Dàgáárè is described as having twenty-five consonants and two underlying glides (Bodomo 1997). The vowel inventory contains nine vowels, with tongue root contrasts for high and mid vowels, but a single low vowel [a]. In Bodomo's (1997) description of the consonant inventory, the voiced velar stop [ɡ] is said to alternate with [ɣ] intervocalically. The data included with this description is the single word (/pɔ́gɔ́/ 'woman') where [ɡ] occurs between RTR vowels. According to our auditory impression, including that of the second author who is a native speaker, intervocalic <ɡ> is not a velar fricative.

This paper describes an acoustic and articulatory study of Dàgáárè <ɡ> in Central Dàgáárè, spoken in Nadowli-Kaleo district in Ghana. Waveforms, spectrograms, duration, ultrasound images, and static palatograms of intervocalic <ɡ>

are studied. The acoustic and articulatory results show that intervocalic <ɡ> has the complex waveform, amplitude variation, formant structure, tongue movement, and closure typical of a tap, rather than a velar fricative.

### **2 Methodology**

The data come from a native speaker of Dàgáárè and were collected at ISRL Lab, University of British Columbia, in a room using Sennheiser MKH 8060 shotgun microphone at the sampling rate of 44kHz/16bit.

An Aloka Pro-Sound SSD 5000 ultrasound machine with an Aloka UST-9119- 3.5 convex transducer (pulse frequency 3.5MHz, field of view 120º) collected a moving image of tongue movement. The ultrasound probe was positioned manually against the mylohyoid muscle and was kept stable with a mechanical arm. The stimuli for ultrasound and acoustic studies contain 42 tokens with intervocalic [ɡ]. Each token was repeated twice.

To determine the place of articulation of the closure, a palatogram was recorded. The tongue was painted with charcoal mixed in olive oil before the participant produced four tokens with intervocalic <ɡ>. After articulating each of the tokens, an image of the soft-palate was captured.

### **3 Results**

All instances of intervocalic <ɡ> were segmented manually in Praat (Boersma 2002) and a script was used to extract duration values. The waveform and spectogram were manually extracted.

The waveform of Dàgáárè <ɡ> has a decrease in amplitude compared to surrounding vowels, but it is complex as can be seen in Figure 1. This is similar to the expected properties of a tap, but distinct from both voiced velar stops and resonants; from a voiced stop, we would expect a simple waveform for voicing, while with a resonant, we would not expect an amplitude decrease.

In the spectrogram of Dàgáárè <ɡ>, we regularly see formant structure throughout the consonant. This is typical of resonants and possible for taps but is not consistent with a stop. For a [ɡ], we would expect a gap in the spectrogram with a voicing bar at the bottom; but this is not what we see for Dàgáárè <ɡ>. With a fricative, we would expect random noise, which is again not what we see. the spectrogram also shows that <ɡ> does not feature spectral energy like the other non-sibilant fricative [v] in the spectrogram or a dorsal fricative (see Jesus & Shadle 2005).

Figure 1: Waveform (top) and spectogram (bottom) of Dàgáárè <ɡ>

In terms of duration (including both closure duration and release duration), the average duration of the collected <ɡ> tokens was 0.055 seconds. This is substantially shorter than English [ɡ], as a comparison, which has a duration of around 0.081 seconds (Byrd 1993: closure duration 54ms, release duration 27ms). It is also longer than an alveolar tap, which tends to have a duration between 0.028 and 0.041 seconds. The durations for the Dàgáárè velar can be seen compared to English [ɡ] and [ɾ] in Figure 2.

On the ultrasound, the tongue movement between the vowel position and the consonant was substantial; the tongue back raised towards the palate/velum. This degree of movement is consistent with either a stop or a tap, because the tongue moves far from the vowel position to make closure. A resonant would have less movement, due to the lack of closure. The ultrasound images can be seen in Figure 3. A fricative would have less movement than a stop or tap, but more than a resonant; these images are potentially consistent with a fricative.

Figure 2: Dàgáárè <ɡ> relative to English [ɡ] and [ɾ]

Figure 3: Intervocalic <ɡ> in Dàgáárè

In the palatography shown in Figure 4 (page 6), the pattern of charcoal left on the palate after production of <ɡ> showed evidence of closure in the palatal/velar region. Closure is typically seen for stops and taps, but not for resonants or fricatives.

In summary, although the Dàgáárè <ɡ> has a longer duration than an alveolar tap, its production is most consistent with the behaviour of a tap, in terms of waveform, spectrogram, ultrasound, and palatography. In particular, it is not consistent with a stop or a resonant in a number of ways. These results are summarized in Table 1 (page 6).

## **4 Discussion and conclusion**

The results show that intervocalic [ɡ] in Dàgáárè has a complex waveform, amplitude decrease, formant structure, a short duration, significant tongue movement, and closure. These features are strong tap-like features and suggest that Dàgáárè intervocalic velar [ɡ] is not a velar fricative but a tap. Such a segment type has previously been unattested and predicted, moreover, to be impossible (Ladefoged 1990). Given cross-linguistic evidence that velar softening mostly results in palatalization (Halle 2005) and the charcoal stain on the participant's velum and hard palate in the palatograms, we note however that the intervocalic velar in Dàgáárè could be a palatal tap, a sound which is also unattested but predicted to be possible.

Given that this study was based on data from a single native speaker of Dàgáárè, future work should focus on a larger population sample of Dàgáárè speakers. Dàgáárè intervocalic velar [ɡ] should also be compared with velar [ɡ] in clusters and related segments in related languages, e.g. lenited velars in Dagbani (Hudu 2010). This is a logical direction considering the argument in Elugbe 1978 that the lenis consonants in Edoid languages are taps.

Generally, this study has shown that Dàgáárè intervocalic [ɡ] is not a fricative, but a velar tap or a palatal tap which are both previously unattested sounds. Based on these findings, Dàgáárè velar [ɡ] requires further investigation.

## **Examples of words with voiced velar [ɡ] in Dàgáárè**

	-

Figure 4: Palatogram showing closure


#### Table 1: Result summary

*a* (Byrd 1993)

*b* (Ting 2007)


## **Acknowledgments**

This work was supported by a grant to Pulleyblank from the Social Sciences and Humanities Research Council of Canada (SSHRC).

## **References**

Bodomo, Adams. 1997. *The structure of Dagaare*. Stanford, CA: CSLI Publications. Boersma, Paul. 2002. Praat: A system for doing phonetics by computer. *Glot International* 5(9/10). 341–345.


# **Chapter 2**

# **On the Ngbugu vowel system**

Kenneth S. Olson

SIL International

Previous researchers have posited asymmetric oral vowel systems for Ngbugu and other Banda languages. The present analysis shows that Ngbugu has a symmetric ten-vowel system which includes one interior vowel /ə/ and lacks vowel harmony. It also supports and refines Boyeldieu & Cloarec-Heiss's (2001) proposed Proto-Banda vowel system. The affinities of the resulting proto vowel system to those of nearby languages could facilitate the comparison of vowel systems across the region in order to test hypotheses about shared inheritance or borrowing. Possible explanations for the lack of vowel harmony are suggested.

## **1 Introduction**

Authors of previous studies have proposed various oral vowel systems for Ngbugu (ISO 639-3 code=lnl), a language of the Banda group, Ubangian family, spoken in southcentral Central African Republic by about 95,000 people (Simons & Fennig 2018). These proposed systems are shown in Tables 1, 2, and 3.<sup>1</sup>


Table 1: Ngbugu oral vowels (Cloarec-Heiss 1978: 13–14)

1 I use standard IPA transcriptions for segments and tone in this paper.

Kenneth S. Olson. 2022. On the Ngbugu vowel system. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 9–23. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo . 6393734

#### Kenneth S. Olson

The system in Table 1 contradicts a universal put forth by Crothers (1978: 122): "The number of height distinctions in front vowels is equal to or greater than the number in back vowels." Yet, the majority of Banda languages appear to exhibit it. Cloarec-Heiss (1978: 13–16) posits this same system for seven other Banda speech varieties: Langbasi [lna], Ngundu [nue], Kpagua [kuw], Gubu [gox], Gbi [bbp], Linda [liy], and Yakpa [bjo]. It is also the system posited for Mono [mnh] by both Kamanda-Kola (2003) and Olson (2005) independently of each other.


Table 2: Ngbugu oral vowels (Théret-Kieschke 1998: 43)

Théret-Kieschke's proposed system in Table 2 is more symmetric. In addition to monophthongs, she posits two diphthongs. She considers /ɔ/ to be a marginal phoneme (p. 9).



Boyeldieu & Cloarec-Heiss propose the system shown in Table 3. They transcribe the low front vowel as /(i)ɛ/, capturing the generalization – according to their data – that the phoneme is usually realized as [iɛ], yet surfaces as [ɛ] in initial position and immediately following /w/. The vowel /ɔ/ is positioned as a mid vowel in their chart, along with /e/ and /ə/ (p. 192).<sup>2</sup>

Comparison of these three proposed systems raises questions about the phonemic status of /ɛ/, /o/, and /ɔ/. To address these questions, I worked with a team of three native Ngbugu speakers during three visits to Bangui from 2015 to 2017

<sup>2</sup>Boyeldieu (pers. comm.) considers /ɔ/ to be phonetically halfway between [o] and [ɔ] when it does not follow a /w/.

(a total of five weeks) to re-evaluate the oral vowel system, employing the participatory research methodology elucidated in Kutsch Lojenga (1996).<sup>3</sup> I do not address diphthongs or nasal vowels, which are both also part of the Ngbugu vowel system.

Prior to our consultations, the team had collected a corpus of about 2000 lexical items. We removed compound words, borrowings, derived words, etc., after which we had a corpus of about 700 words to work with.

In §2, I provide evidence for the phonemic status of /ɛ/, /o/, and /ɔ/, as well as an in-progress merger between /o/ and /ɔ/. I also provide evidence for the existence of an additional vowel /ʊ/ not reported by the previous researchers. In §3, I provide acoustic evidence for my transcription of the vowels and show that /ɨ/ is best reinterpreted as the front vowel /ɪ/. In §4, I discuss possible implications for the historical development of vowels in the Banda group.

## **2 Phonology**

Several arguments support the phonemic status of /ɛ/. First, contrasts between /ɛ/ and its phonetically similar segment /e/ are common in Ngbugu. A sampling of these contrasts is shown in Table 4.


Table 4: Contrasts between /ɛ/ and /e/

Second, my language consultants had no difficulty distinguishing /ɛ/ and /e/, and the orthography testing they conducted in the Ngbugu community suggested that this is true among Ngbugu speakers in general. Third, /ɛ/ is not rare, occurring in more than 20 words in our corpus. This count does not include word-initial [ɛ], which may be a relic of an historical prefix denoting animals ̄ (Greenberg 1970: 13). Fourth, /ɛ/ occurs in word-initial, word-medial, and wordfinal positions, as shown in Table 5.

<sup>3</sup>The three language consultants conducted extensive orthography testing in the Ngbugu region during this period of time.


Table 5: Distribution of /ɛ/

While the phonetic sequence [iɛ] does occur in a substantial number of words, it is also true that [ɛ] is attested immediately following more consonants than just /w/, e.g. [ŋɡʊ́.lɛ̀.ʃò] 'earthworm', [ʃé.ʃɛ] 'bifurcation', [ɡbɛ ̄ ] 'all', [ndɛ ́ ̀.rə̀] 'sticky'. Additional examples are found in Boyeldieu's Ngbugu wordlist in *RefLex* (Segerer & Flavier 2011). There are also cases where [iɛ] and [ɛ] both occur after the same consonant, e.g. [ɡbīɛ] 'king' vs. [ɡbɛ ̄ ] 'all'. These considerations bolster the view ́ that /ɛ/ is a phoneme in its own right, distinct from the diphthong /iɛ/.

Finally, there is a clear acoustic distinction between /ɛ/ and /e/, as discussed below in §3.

Several arguments support the phonemic status of both /o/ and /ɔ/. First, there is contrast between these two phonetically similar segments, as shown in Table 6.


Table 6: Contrasts between /o/ and /ɔ/ (Théret-Kieschke 1998: 9)

Second, many native speakers have no difficulty distinguishing the two sounds. Third, both sounds are common, each occurring in more than 30 words in my corpus. Fourth, there is a clear acoustic distinction between the two sounds, as discussed below in §3.

Despite this evidence, the case of /o/ and /ɔ/ is complicated by a couple of factors. First, for apparently all speakers of Ngbugu – even those for whom the two sounds are contrastive – free variation occurs between /o/ and /ɔ/ for certain lexical items, as exemplified in Table 7.

Second, while all three of my language consultants recognize the distinctiveness of /o/ and /ɔ/, they indicate that some Ngbugu speakers do not distinguish the two sounds, opting to produce /o/ in all cases. Théret-Kieschke (1998: 9) noted


Table 7: Free variation between /o/ and /ɔ/ in some lexical items

this pattern among younger speakers and suggested that a merger is currently underway between /o/ and /ɔ/: \*o, \*ɔ > o. Robust contrast exists between /o/ and /ɔ/ in most other Banda varieties, e.g. Linda (Boyeldieu & Cloarec-Heiss 2001) and Mono (Olson 2005), which harmonizes well with this claim.

During the course of our research, we encountered an additional synchronic vowel phoneme, /ʊ/, not attested by previous researchers. Several factors support the existence of this additional phoneme. First, there is contrast between /ʊ/ and all of the other back vowels, as shown in Table 8.


Table 8: Contrast between /ʊ/ and other back vowels

Second, native speakers readily distinguish /ʊ/ from other back vowels. Third, /ʊ/ is not rare, occurring in over 40 lexical items in my corpus. Fourth, /ʊ/ occurs both medially and finally, e.g. [sʊ̀.ɡʊ́] 'pillar'. Finally, /ʊ/ is distinct acoustically from the other back vowels, as discussed in §3.

While /ʊ/ does not occur as such in any of the previous research, it does show up indirectly in Boyeldieu & Cloarec-Heiss (2001), where the sequence transcribed there as /wɔ/ corresponds to my /ʊ/, as shown in Table 9.<sup>4</sup>

Though Boyeldieu & Cloarec-Heiss do not document extant /ʊ/ per se, their comparative study of Ngbugu and Linda leads them to reconstruct the high back [–atr] vowel \*ʊ for Proto-Banda (pp. 199, 202–203). One of the key findings of the current paper is that the reflex /ʊ/ of the proto phoneme \*ʊ is present in Ngbugu today.

<sup>4</sup>Most of the occurrences of [ʊ] in Table 9 occur after [ʁ], but most cases of [ʊ] in my corpus follow other consonants. The difference between my data and that of Boyeldieu and Cloarec-Heiss could be due in part to dialectal variation. More research is necessary on this.


Table 9: Samples of /ʊ/ transcribed as /wɔ/ by Boyeldieu & Cloarec-Heiss (2001)

This finding suggests a possible novel use of the comparative method – as an aid to linguistic fieldwork. The comparison of the sound systems of related languages can be used not only to reconstruct proto phonemes, but it can also lead to hypotheses about the structure of the synchronic sound systems and hence serve as a diagnostic for examining them more closely. In the current case, Boyeldieu & Cloarec-Heiss's positing of \*ʊ led my language consultants and me to examine more carefully the high back vowels of Ngbugu, eventually unearthing /ʊ/.

### **3 Acoustic properties**

An acoustic study was undertaken in order to verify the transcription of the Ngbugu vowels. The subject was a male native speaker of Ngbugu in his late 40s at the time of the recordings. He grew up in the Ngbugu region, and both of his parents speak Ngbugu as their first language. He has obtained his *baccalauréat* and has taken some university courses. He moved to Bangui in 2014. The recordings were made in 2015 and 2016 in Yaoundé and Bangui, respectively. Besides Ngbugu, he is also fluent in Sango [sag] and French [fra].

The set of data was recorded at 48k, 24-bit, using a Zoom H2 recorder, and saved as WAV files. The 2015 recording session took place at the SIL center in Yaoundé, and the 2016 recording session took place at the ACATBA (Association Centrafricaine pour la Traduction de la Bible et l'Alphabétisation) center in Bangui. Twelve tokens of each vowel were chosen for analysis. This included two tokens of each vowel spoken in isolation. In most of the words selected, the vowel followed a coronal consonant.

Acoustic analysis was performed using Praat v. 6.0.37 (Boersma & Weenink 2018). I first visually inspected a wide-band spectrogram of each token to verify that there was a steady state period of the vowel. I then visually identified the midpoint of the steady state. The window of analysis was centered on this midpoint. The formant measurements were made using the LPC analysis feature in Praat, employing its default parameters, except that the "Maximum formant (Hz)" setting was changed from 5500Hz to 5000Hz, the latter being more appropriate for a male speaker (Boersma & Weenink 2018). Because LPC calculations of *F*<sup>1</sup> can potentially be influenced by a high ƒ<sup>0</sup> , I verified the formant measurements by visual inspection on a wide-band spectrogram and spectral slices, when appropriate. The *F*<sup>1</sup> vs. *F* <sup>2</sup> plot is shown in Figure 1.

Figure 1: Formant plot of Ngbugu vowels (12 tokens each)

Several observations can be made about this plot. First, the *F* <sup>2</sup> of what I have been transcribing as /ɨ/ is generally higher (~1600Hz) than the *F* <sup>2</sup> of /ə/ (~1400Hz), approaching the front vowels /i/ and /e/. This suggests that /ɨ/ may best be construed as a front vowel. I will transcribe it as /ɪ/ for reasons that will soon become apparent.

Second, the positioning of /ɪ/ and /ʊ/ in the plot generally corresponds to what we expect for high [–atr] vowels. In Starwalt's (2008) crosslinguistic study of the acoustics of atr vowel harmony systems, she found that the *F* <sup>2</sup> of /ʊ/ is consistently lower than the *F* <sup>2</sup> of /o/ for the African languages she studied (although this was not always statistically significant) – *Kwa*: Foodo [fod] and Ikposo [kpo] – *Bantu*: Kinande [nnb] and LuBwisi [tlj] – *Defoid*: Ekiti-Yoruba [yor]. The positioning of /ʊ/ vis-à-vis /o/ in Figure 1 is consistent with this.

With respect to front vowels, Starwalt found some variation: for some speakers of Foodo (p. 105), Kinande (p. 128), and LuBwisi (p. 136), the *F* <sup>2</sup> of /ɪ/ is lower than the *F* <sup>2</sup> of /e/. This is consistent with what I found for Ngbugu. For the rest of Starwalt's speakers, the *F* <sup>2</sup> of /ɪ/ was higher than the *F* <sup>2</sup> of /e/.

This acoustic study is preliminary. Testing additional subjects would help confirm that our data are indicative of the larger Ngbugu-speaking population. Ladefoged (2003) suggests testing a half-dozen speakers of each sex.

### **4 Discussion**

#### **4.1 Vowel system symmetry**

If /ɨ/ is reinterpreted as /ɪ/, as proposed in §3, the resulting Ngbugu vowel system becomes symmetric, as shown in Table 10.


Table 10: Reanalyzed Ngbugu oral vowel system

Not only is symmetry what is generally expected for vowel systems (Pike 1947: 59), it is also what is found in most languages of the region, as shown in Table 11. The languages of all of the Ubangian subgroups except Banda exhibit symmetric vowel systems, as do many of the languages from the nearby Central Sudanic group Bongo-Bagirmi.

In fact, the reanalyzed Ngbugu vowel system is similar to the set of Proto-Banda monophthongs reconstructed by Boyeldieu & Cloarec-Heiss 2001 shown in Table 12. The differences are (1) the presence of a high central vowel \*ɨ, and (2) the absence of the front vowels \*ɪ and \*ɛ.

It is not surprising that Boyeldieu and Cloarec-Heiss posited the proto phoneme \*ɨ. At the time of their study, it was thought that all Banda languages had a high central vowel. The finding that Ngbugu has extant /ɪ/ instead opens up the option of positing the proto phoneme \*ɪ (rather than \*ɨ) with a corresponding sound change \*ɪ > ɨ to account for the presence of /ɨ/ in the other Banda varieties


Table 11: Vowel systems of sample languages in geographic proximity ("cross" = cross-height harmony)

Table 12: Proto-Banda vowel system (Boyeldieu & Cloarec-Heiss 2001)


(subject to confirmation via the comparative method). This also leads to a more typologically common proto vowel system.

As for the absence of \*ɛ, Boyeldieu & Cloarec-Heiss posit instead the proto diphthong \*ia (pp. 198–199). In their analysis, occurrences of [iɛ] in Ngbugu following labial, alveolar, and velar consonants are combined with occurrences of [ia] in Ngbugu following palatal consonants in order to reconstruct the proto diphthong. The corresponding Linda forms are [eya] following labials, [ia] following alveolars and velars, and [a] following palatals. Given their data, an equally valid reconstructed form would be \*iɛ. Absent from their correspondence sets are occurrences of [ɛ].<sup>5</sup>

If we examine cases of [ɛ], we see that Ngbugu [ɛ] corresponds with Linda [ja] (Moñino 1988) in word-initial position, as shown in Table 13.

<sup>5</sup>There are some residual items in Boyeldieu & Cloarec-Heiss's data: [iɛ] and [ia] contrast in [gia] 'tourner la pâte' vs. [giɛ̀] 'animal, viande'; and [ɛ] and [iɛ] contrast in [ŋgàɛ̀] ~ [ŋgɛ̀] 'canne à sucre' vs. [ŋgìɛ] 'noyau de la noix de palme' (pp. 192–193). ́


Table 13: Correspondences between Ngbugu [ɛ] and Linda [ja]

I was not able to identify cognates in Linda that correspond to the Ngbugu words in which [ɛ] is word-medial or word-final. Hence, more research is necessary. That being said, the distinction between the two correspondence sets (Ngbugu iɛ~ia vs. Linda ia~eja and Ngbugu ɛ vs. Linda ja) leads me to propose two proto phonemes, \*iɛ and \*ɛ, with a possible merger \*iɛ, \*ɛ > ja in Linda. The choice of \*ɛ leads to the Proto-Banda system in Table 14 that is not only typologically more common but is also nearly identical to the extant Ngbugu one.

Table 14: Reanalyzed Proto-Banda vowel system


One mystery of the Banda group has been its unusual inventory of vowels (cf. Table 1). The revised symmetric Ngbugu vowel system – and the comparable Proto-Banda system proposed here – are much more in line with those found in the surrounding languages. Of particular comparative interest are the vowel systems of Nzakara and Zande, since Nzakara is the immediate neighbor of Ngbugu to the northeast. Ngbugu, Nzakara, and Zande all have identical inventories of *phonetic* vowels: [i ɪ e ɛ ə a ɔ o ʊ u]. These similarities allow for the straightforward comparison of vowels between groups, something that was very difficult given our previous understanding of the vowel systems in the Banda group. This makes it more believable that the Banda group could be related to other language groups in the vicinity.

#### **4.2 Vowel harmony**

Boyeldieu & Cloarec-Heiss suggest that there are traces of a Proto-Banda atr harmony system in extant Linda (p. 189), and to a lesser degree in extant Ngbugu (pp. 196–197, 202). The existence in the current-day Ngbugu vowel system of contrasts between [+atr] and [–atr] vowels lends credence to the hypothesis of this earlier atr harmony system.

Indeed, the revised Ngbugu vowel inventory bears a remarkable resemblance to inventories that exhibit atr harmony. It is the same inventory as the ten-vowel systems that exhibit the "most straightforward" cases of atr harmony in Africa, where the vowels are divided into two groups: the [+atr] vowels /i e ə o u/ and the [–atr] vowels /ɪ ɛ a ɔ ʊ/ (Casali 2008: 499).

Yet, it is relatively clear that the current Ngbugu system does not exhibit vowel harmony, for two reasons. First, there are many cases in Ngbugu of monomorphemic words containing both [+atr] and [–atr] vowels, shown in Table 15.


Table 15: Monomorphemic words with both [+atr] and [–atr] vowels

Second, to my knowledge there are no cases of [+atr] ~ [–atr] alternations in Ngbugu roots or affixes (Casali 2008: 500).

The absence of vowel harmony in Ngbugu is somewhat surprising given the preponderance of harmony systems elsewhere in the region (cf. Table 11). Sys-

tems which exhibit harmony between the two sets of mid vowels /e, o/ and /ɛ, ɔ/ (labeled "mid" in Table 11, VH column) are found in the Gbaya group (e.g. Gbeya), in Sere-Ngbaka-Mba (e.g. Ngbaka-Ma'bo), and to some degree in the Central Sudanic group Bongo-Bagirmi (e.g. Bagiro, which is immediately to the west of the Ngbugu region). The lingua franca Sango from the Ngbandi group also exhibits harmony of this type, but it has exceptions.<sup>6</sup>

Cross-height harmony systems in which both high and mid vowels undergo atr harmony on a surface phonetic level (labeled "cross" in Table 11, VH column) are found in both Nzakara and Zande. In these languages, high, mid, and low vowels all undergo atr harmony. For both languages, the mid vowels show harmony only at the surface phonetic level, i.e. [e] and [o] are allophones of /ɛ/ and /ɔ/, respectively. In addition, for Nzakara the [a] ~ [ə] alternation is also surface phonetic.

The presence of atr harmony in Nzakara and Zande, the identical inventories of phonetic vowels between Ngbugu, Nzakara, and Zande, and the traces of atr harmony identified by Boyeldieu and Cloarec Heiss for Linda and Ngbugu – all these factors lead to the hypothesis that Proto-Banda exhibited atr harmony. This harmony system was either inherited or borrowed, and then it was subsequently lost.

What could have led to the loss of atr harmony in Proto-Banda? There are at least a couple of factors to consider. First, in a crosslinguistic survey, Rolle et al. (2017) observe that atr harmony and interior vowels (e.g. [ɨ], [ə]) appear to be in an antagonistic relationship, and that the presence of both in a given vowel system is dispreferred. Perhaps Proto-Banda had both atr harmony and the phoneme /ə/ at some point in its history, and the atr harmony was subsequently lost due to pressure from the interior vowel.

Second, both Samarin (1982) and Cloarec-Heiss (1995) quote Brunache (1894: 205–206) who provides evidence for the existence of a Banda lingua franca in the region in the late 19th century. If this is true, the loss of atr harmony may have been a type of simplification of the language structure that is often associated with pidgins and lingua francas.

Either of these possibilities – internal systemic pressure or L2 simplification – could have contributed a certain instability to the phonological system, leading to the loss of the atr harmony, as well as other eventual structural changes.

<sup>6</sup> In discussing the simplification of Sango vis-à-vis the Ngbandi group, Samarin (2000: 313) states, "Co-occurrence of vowels has been simplified by vowel harmony: i.e., mid vowels in a single word are either tense or lax, not both."

## **5 Conclusion**

In summary, extant Ngbugu has a symmetric vowel system (including one interior vowel) that resembles vowel systems of the other groups in the region, except for the absence of vowel harmony. The extant atr contrasts in Ngbugu lend support to Boyeldieu and Cloarec-Heiss's (2001) reconstructed Proto-Banda vowel system containing atr contrasts. The traces of a vowel harmony system in Linda and Ngbugu, combined with the similarity of Ngbugu's surface phonetic vowel inventory to that of nearby languages that exhibit vowel harmony (particularly Nzakara) support the hypothesis that Proto-Banda may have had vowel harmony at some point in its history.

## **Acknowledgments**

I wish to thank Tychique Longbo, Jessé-Joël Adoumacho, and Guy-Florent Matchi for their collaboration and friendship, Connie Kutsch Lojenga for assistance with the methodology (and in particular being the first to identify /ʊ/), Pascal Boyeldieu, Mike Cahill, Bill Samarin, Coleen Starwalt, and participants at ACAL 49 for helpful comments (all errors are my own), and the SIL Central African Republic Service Group for funding.

## **References**


# **Chapter 3**

# **Phonological adaptation of the Belgian French vowels in Kinshasa Lingala**

## Philothé Mwamba Kabaselea,b,c

<sup>a</sup>University of Calgary <sup>b</sup> ISP/Gombe <sup>c</sup>University of Kwa-Zulu Natal

This study provides a systematic analysis of vowel sound adaptations in KL with evidence from acoustic phonetics. The research is restricted to the phonological adaptations of vowel sounds from Belgian French (BF). It provides evidence from loan data on the existence of the contrastive features [±ATR] in KL phonological system. Questions raised include: does the phonological system of KL take precedence in the phonological adaptation process of the loanwords? Does similarity play a role in the adaptation of the loanwords? What happens when the foreign input does not offer any similarity with the phonological system of the recipient language (RL)? what happens when a feature/feature combination (FC) in a foreign input vowel either presents similarities with a feature/FC in the RL phonological system, or else does not present any similarities to any feature or FC in the phonological system of the RL? The data were extracted in a sentential context with a carrier sentence. Participants filled in the dots with the missing word that was suggested by the picture. The F1 and the F2 measurement values, in hertz (Hz) were taken at three different points of the vowel spectrogram. The script also generated the average measurement values which were considered as input for statistical analysis. The null hypothesis (Ho) predicts that BF [ɛ, œ, ø] would be adapted as [e] (Ho: [ɛ] = [e], [œ] = [e], and [ø] = [e]) in KL, while the alternative hypothesis (H1) predicts that the BF vowels [ɛ, œ, ø] would not be adapted as [e] (H1: [ɛ] ≠ [e], [œ] ≠ [e], and [ø] ≠ [e]) in KL. The Ho predicts that [ɔ] will be adapted as [o] (Ho: [ɔ] = [o]). Due to correlated nature of the data, Generalized Estimating Equation (GEE) was used to determine the degree of significant differences between two/more targeted variables. The findings have shown that KL speakers still discriminate between [ɛ] and [e], and [ɔ] and [o], which implies the existence of the underlying contrast between the features [+ATR] and [−ATR].

Philothé Mwamba Kabasele. 2022. Phonological adaptation of the Belgian French vowels in Kinshasa Lingala. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 25–45. Berlin: Language Science Press. DOI: 10.5281/zenodo.6906649

### **1 Introduction**

The phonological adaptation of illicit foreign vowel sounds into Kinshasa Lingala (KL) linguistic system has received very little scholarly attention (Bambi-Idikay 1998, Mudimbe et al. 1977). There has been no systematic analysis of vowel sound adaptations in KL with evidence from acoustic phonetics, nor any explanation of these adaptations in terms of phonetics and its interface with phonology. Such analysis and explanation are the focus of the present study. The current study is restricted to the phonological adaptations of vowel sounds from Belgian French (BF), as this general dialect of French was the primary source of loanwords in Lingala. The main goal of this study is to provide evidence from loan data on the existence of the contrastive features [±ATR] in the phonological system of KL. The findings of this study serve as a diagnostic test.

This research is important because the findings of loanword adaptation allow linguists to understand the kind of cognitive transformation that a phonological system imposes to any linguistic input at the phonological level. It should be noted that an analysis which considers only native data may fail to account for the changes and fail to effectively capture the phonological system of the TL. An investigation of a phonological system with data from loanword adaptation is relevant in that the output of loanword adaptation reflects all facets of the phonological structure of a target language at the segmental, phonotactic, suprasegmental, and morpho-phonological level of the borrowing language (Kang 2011: 1). Channeling Berko (1958), Kang (2011: 1) suggests that "loanword phonology be considered as a real-life Wug test which can allow linguists to probe into the grammatical knowledge of speakers in ways that native data alone cannot". Kang shows the real importance of a study in loanword adaptation and what could be its contribution in understanding the fine grain phonological system of a TL. Loanword adaptation can be used as a diagnostic test of the phonological preferences and constraints of a TL.

Both integrated loanwords and on-line adaptations (Shinohara 1997) will be considered. On-line adaptations are loans which are borrowed "here and now" (Calabrese & Wetzels 2009: 66) whereas integrated loanwords refer to those adopted forms which have been made part of the recipient language's lexicon and whose source form appears to have been lost due to the phonological adaptation. The broad aim of this study is to identify various phonological strategies that Lingala speakers use when adopting and adapting BF vowels into their linguistic system. The study further aims to provide the phonetic-phonological motivation which justifies the choice of strategies during the loanword adaptation

process and the different phonological constraints that regulate the adaptation process of vowels.

Previous research on loanwords (Mudimbe et al. 1977, Bambi-Idikay 1998) relied strictly on phonological conditions in Lingala to explain the adaptation of French words in Lingala. This exclusive focus on Lingala phonology is understandable, but the leading experts in loanword phonology, LaCharité and Paradis, warn that the source language phonology plays an important and preponderant role in the process of adaptation (see, e.g., LaCharité & Paradis (2005)). They claim, as Kang et al. (2014: 1) put it, that "the adapters are competent bilinguals with native-like knowledge of the input language phonology and the phonological structure of the source language (SL) serves as input to the adaptation process, rather than the surface phonetic forms of the source language". LaCharité and Paradis's viewpoint is informed by multiple large-scale studies of loanword adaptation situations.

To make sense of vocalic adaptations in this study, among others, I will argue that three stages are involved in the adaptation process of an illicit input in the recipient language: the perceptual stage, the adaptation proper stage, and the implementation stage. The first and last stages are predominantly phonetically based. That is, phonetic factors play an important role in the adaptation process during these two stages. Yet the present study does not support the strictly phonetic approach to loanword adaptation which is advocated by Peperkamp & Dupoux (2003), Vendelin & Peperkamp (2004), and Peperkamp (2004), among others. This is because the adaptation proper stage is conceived as purely phonological. This is the stage in which the phonological constraints and preferences of the recipient language 'paint' their linguistic identity on the loanword to make it an accepted part of its linguistic system. I will suggest that loanword adaptation proper is a purely abstract process which leaves the speakers of a language with little choice. First, featural combinations which are similar between the SL input and the TL representation are generally preserved; this type of foreign input requires less cognitive effort since the SL features meet the phonological preferences of the TL.

Second, featural combinations which are dissimilar between the SL input and the TL representation are generally transformed and therefore sacrificed. This type of foreign input requires much cognitive effort since the SL features do not meet the phonological preferences of the TL. They may violate a number of the phonological constraints of the recipient language. Therefore, some repairs are imperative in order for the recipient language system to license the new linguistic form in the phonological system.

A third scenario concerns featural combinations that appear dissimilar between the SL input and the TL representation which may presage past and future preferences in the TL, as when certain phonological structures are admitted which have either been deactivated in the synchronic system of the language or have not yet previously been observed in the synchronic system. In this case the adapted linguistic form seems to reveal the reflexes of the diachronic system or seems to be ahead of its time in terms of the future linguistic preferences of the system and reveals the more abstract preferences of the language.

The next section provides some theoretical background on loanword adaptation. The following section briefly describes the research questions; section 4 is the study proper; section 5 the discussion and findings; and section 6 concludes the paper.

## **2 Background**

#### **2.1 Considerations**

Phonological adaptation is a process that affects a loanword in order to make it conform with the linguistic 'identity' of the recipient language. Loanword (in the process of phonological adaptation) is defined as a word that has been borrowed from a linguistic system that is identified as the source language (SL) and is then integrated into a recipient linguistic system that is called the target language (TL). During the process of its integration, the borrowed word may or may not undergo a phonological adaptation which is mainly dictated by the phonological system of the TL that forces the loanword to abide by the phonological preferences and constraints of the recipient language.

A loanword is completely integrated in the borrowing language system when its illicit features become phonologically nativized in the TL in a way that its foreign pronunciation becomes unrecognizable. Thomason (2001: 144) claims that some loanword adaptations occur through the process of 'negotiation' which is identified by the phenomenon of correspondence rules or borrowing routines. Thomason argues that "correspondence rules are (mostly) phonological generalizations drawn, consciously or unconsciously by bilinguals, though full fluency in both languages is not required" (p. 144). He presents its generalization in the form "'Your language has x where my language has y' and the rules are generally applied to nativize the phonology of loanwords" (p. 144). Kenstowicz (2006: 139), by contrast, claims that "[t]he adaptation of a loanword involves the resolution of often conflicting demands to preserve as much information from the source word as possible while still satisfying the constrains that make the lexical

item sound like a word of the recipient language". This way of looking at phonological adaptation may be too simplistic. The phonological adaptation process additionally involves complex abstract linguistic mechanisms that transform an illicit foreign input to abide by the linguistic preferences of the borrowing language. The phonological system of the recipient language – being a component of an autonomous linguistic system – is not obviously concerned about preserving the information from the source language. It may, but need not, preserve the linguistic features which abide by its phonological preferences in order to license its integration into the recipient linguistic system.

The adaptation that a loanword undergoes is a phonological transformation that results at the behest of the phonological system of the host language. The loanword has to abide by the phonological constraints and preferences of the TL. As Kang (2011) puts it, "[s]uch adaptation affects all facets of phonological structure, reflecting the segmental, phonotactic, suprasegmental and morphophonological restrictions of the borrowing language" (p. 1). The output of the phonological adaptation process reflects "aspects of native speakers' linguistic knowledge" (Kang 2011: 1). Loanword adaptation, therefore, allows linguists to understand the kind of cognitive transformation that a phonological system imposes to any linguistic input at the phonological level. Loanword adaptation can be used as a diagnostic test of the phonological preferences and constraints of the TL. It even reveals likely preferences that are not yet observed in the phonological system of the TL.

#### **2.2 Theories on loanword adaptation**

Though some ad hoc factors were identified in the literature on loanword adaptation, including orthography, morphology, and semantics (e.g. Vendelin & Peperkamp 2004, Adler 2006, Davis & Cho 2006, Miao 2006, Smith 2006a,b), three broad approaches to phonological adaptations are pursued in the literature (Lin 2009): the phonology approach, the perception approach, and the perceptionphonology approach. This study adopts the perception-phonology approach.

#### **2.3 The perception-phonology approach**

A joint approach is proposed by Silverman (1992), Yip (1993, 2004), and Rose (1999b,a) who give credit to both the phonology and the perception approach by integrating the relevant features of the perceptual and the phonological components of the grammar. The perception-phonology approach claims that the input is shaped by the perceptual skills of the borrower, which determine how

s/he decodes the acoustic signals of the source language. The adaptation process is dictated by the phonological system of the recipient language (e.g. Silverman 1992, Yip 1993, 2002, 2006, Steriade 2001, Kang 2003, Kenstowicz 2007, Kenstowicz & Suchato 2006, Miao 2006).

While the perception-phonology approach claims that "the input to the adaptation process is based on how the borrowers perceive the acoustic signals of the source language" (Lin 2009: 2), it does not identify the linguistic system that governs the perception of the foreign input. In other words, the approach does not predetermine whether the perception of a borrower is determined by the phonetic or phonological component of the language. Nor does it tell us whether the perceptual skills are part of the source language phonological system or whether it is the result of the recipient language phonological system. These questions are worth to clarify in order to shed light on the adaptation process of loanwords.

If we admit that adaptation is perceptual, we should equally admit that it is the phonological preferences (system) of the TL which interpret the foreign input in the way we perceive it. The phonological adaptation certainly starts at the perceptual level, but in turn it is dictated by the phonological preferences of the TL. For instance, if a foreign input is the aspirated stop [ph] and aspiration is not a preferred feature in the phonology of the TL (that is, it is not a component of the underlying representation of the language), native speakers of that TL may not perceive the aspiration because their grammar is unable to decode and interpret it as such. This is what Boersma & Hamann (2009: 6) argue: "perception is largely bound by native output constraints, so that a structure that violates native constraints cannot be perceived faithfully". Perceptual similarity is one of the most important cues in the process of loanword adaptation. It links the features of the foreign input to the features in the phonological system of the recipient language. Kang (2011: 6) argues that "foreign input is faithfully perceived yet can nevertheless be adapted to adhere to native phonotactic constraints".

This entails that the foreign input comes with its fully specified phonetic features which are adapted – or not – during the adaptation process. The input features that are similar to the phonological features in the recipient language are more likely to be adopted and preserved in the adaptation process than the input features which offer no similar matches in the recipient language.

### **3 Research questions**

This study focuses on the phonological strategies used to adapt BF vowels in loanwords in Lingala, as spoken in the Congolese capital city of Kinshasa. Questions

raised include: does the phonological system of KL take precedence (dictate its linguistic preferences) in the phonological adaptation process of the loanwords? Does similarity play a role in the adaptation of the loanwords? What happens when the foreign input does not offer any similarity with the phonological system of the recipient language? Of special interest is what happens when a feature or feature combination in a foreign input vowel either presents similarities with a feature or feature combination in the recipient language phonological system, or else does not present any similarities to any feature or feature combination in the phonological system of the recipient language?

The present study focuses on the BF vowels shown in (1a). All non-low vowels of KL are presented in (1b). Note that BF combines [+front] and [+round] in /y, ø, œ/, whereas this featural combination is disallowed in KL. Note, too, that the [−ATR] mid vowels /ɛ, ɔ/ are in parentheses in (1b) because it is widely claimed that these vowels have fully merged with /e, o/ in KL (e.g., Motingea Mangulu 2006: 20, Bokamba 2012: 303, Campbell & King 2013: 965). If this is correct, the feature [±ATR] is not at all contrastive in KL phonology, whereas it is in BF.

(1) a. Selected vowels of Belgian French


b. Non-low vowels of Kinshasa Lingala


### **4 The study proper**

#### **4.1 Rationale**

The data were extracted in a sentential context with a carrier sentence which was structured as *Nalobi……sikoyo* 'I say…now'. Participants had to fill in the dots with the missing word that was suggested by the picture. The measurement

values of these data were extracted at three different points such as at the beginning, middle and end of the formant and the automatically generated average values of each token were then computed by Praat. It is these average values which were considered for statistical analysis.

#### **4.2 Participants**

Eighteen subjects were recruited to elicit the data of this experiment. They all were native speakers of KL who were born and raised in Kinshasa. None of them has ever left Kinshasa. All of them could speak some French. Their ages varied between 18 and 57 years old. None of them had any problem in articulation. They all were healthy at the time they performed the tasks of this study. Table 1 presents the demographic of the subjects of this experiment.


Table 1: The demographics of the subjects

#### **4.3 Research hypothesis**

Six pairs of vowels were compared in this study. The predictions of all the experiments were formulated on the basis of my assumptions which support the claim that when a segment does not exist in the linguistic system of the TL, the illicit segment is adapted to the closest form that exists in the linguistic system of the recipient language (see Kang 2011 for discussion).

The predictions of this study are, for instance, that Belgian French mid vowels /ɛ/, /œ/, /ø/ will be adapted as /e/ with some initial input matching feature(s) preserved in the recipient language if and only if the said feature(s) also exist(s) in its phonological system; or with those features sacrificed if they are not preferred in the phonological system of the recipient language. That is, BF /ɛ/ would be adapted as /e/ in KL if and only if the phonological system of KL disprefers the phonological features of the input /ɛ/. On the assumption in the literature that

/ɛ/ is merged into /e/ in KL (Motingea Mangulu 2006, Bokamba 2012, Campbell & King 2013) , the closest BF medial vowel form in KL would be [e], since [ɛ] is assumed to no longer exist in the system. Therefore, the null hypothesis predicts that BF [ɛ, œ, ø] would be adapted as [e] (H<sup>o</sup> : [ɛ] = [e], [œ] = [e], and [ø] = [e]) in KL, while the alternative hypothesis predicts that the Belgian French vowels [ɛ, œ, ø] would not be adapted as [e] (H1: [ɛ] ≠ [e], [œ] ≠ [e], and [ø] ≠ [e]) in KL.

Along the same lines, the null hypothesis predicts that [ɔ] will be adapted as [o] (H<sup>o</sup> : [ɔ] = [o]). That is, KL speakers will adapt [ɔ] = [o]) since the latter vowel is closer to the formerly mentioned illicit input [ɔ]. Whereas, the alternative hypothesis predicts that [ɔ] will not be adapted as [o] (H<sup>1</sup> : [ɔ] ≠ [o]).

I argue in this study that the adaptation process of a foreign input that is illicit in the TL is a gradient process which abides by a number of constraints that are hierarchically ranked in the phonological system of the recipient language, of which the phonetic-phonology feature matching and mapping that observe the phonological preferences of the TL tend to be universally ranked higher. According to this perspective, the process of phonological adaptation tends to preserve the highly ranked preferred foreign phonetic-phonology feature into the phonological system of the TL. The hypotheses of the 6 pair tests are presented in Table 2.

#### **4.4 Data analysis**

The Praat program<sup>1</sup> was used to measure the values of the F1 and F2 to determine the degree of the height and frontness/backness of the targeted vowels in the whole study. The measurement values, in hertz (Hz), of the F1 and the F2 were taken at three different points of the vowel spectrogram. The script also generated the average measurement values which were considered as input for statistical analysis.

Due to correlated nature of the data, Generalized Estimating Equation (GEE) was used to determine the degree of significant differences between two or more targeted variables. For instance, the F1 values of the adapted [ɛ] (i.e., KL vowel adapted from BF [ɛ]) were compared to the F1 values of adapted [e] (i.e., KL vowel adapted from BF [e]) to determine whether they were significantly different; the same procedures were used for the F2 values of the adapted [ɛ] and the adapted [e] to determine whether they were significantly different. The next section presents the results of different pair tests.

<sup>1</sup>Computer program, version 6.0.14 11 retrieved February 2016 from http://www.praat.org/.


Table 2: Research hypothesis

#### **4.5 Evidence that participants were in KL linguistic mode when producing the target vowels**

How does a researcher make sure that bilingual or multilingual participants are in the target language linguistic mode when producing the target vowels in the study? Five types of evidence are presented here as robust evidence on the fact that the vowels that were produced by KL speakers in this study were effectively adapted vowels in KL, but not French vowels.

First, to make sure that the bilingual speakers were in KL linguistic mode (Grosjean 2001), but not in French linguistic mode, all the research was conducted in KL. That is, all the interactions and instructions were in KL. This technique was used by Grosjean in his study on mixed language processing, in which he strictly controlled for the language mode factor. He specifically told the participants that he was doing research on mixed language (code-switching, borrowing), he interacted with them in mixed language, and asked them to keep their two languages on at all times (pp. 412–413). Besides, subjects in the current study were specifically asked to produce those words in KL the way they always do it while speaking in Lingala. The fact of specifying the language – Kinshasa Lingala – in which they had to produce those words, puts them in KL linguistic mode.

Grosjean (2001: 2) defines language mode as "[t]he state of activation of the bilingual's languages and language processing mechanisms at a given point in time". For example, if you ask a French-English bilingual speaker to produce the word courage in French, s/he will read it as /kuˈraʒ/, but not as /ˈkɜrɪd͡ ʒ/ even if the subject speaks both French and English. Likewise, if you ask him/her to produce the same word in English, s/he will read it as /ˈkɜrɪd͡ ʒ/, but not /kuˈraʒ/. The specified language and the word that is produced abide by a particular phonological system. This phonological match entails that when a bilingual speaker is asked to produce a word in a specific language, s/he regulates her/his speech production according to the linguistic mode of that particular language.

Grosjean (2000: 410) says, "At the monolingual end of the continuum, bilinguals adopt the language of the monolingual interlocutor(s) and deactivate their other language (s) as best as possible". The mere fact of specifying the language in the instruction helps the bilingual subject to switch into the appropriate language mode that is associated with the specific language. Grosjean further states "when the bilingual is in a monolingual mode, one can assume that the other language is not activated." Green (1986: 412) even proposed that the other language is "inhibited". Third, even if those words look French in their orthography, they have been fully integrated in KL and do not have an alternative in the language. For instance, if you ask a KL speaker to give you the Lingala word for 'a window' or 'a door', s/he will most certainly say *fenetre* 'window' or *porte* 'door' respectively. These words have been fully integrated in the KL lexical system that the original words are just lost and unknown to speakers. Fourth, evidence that my subjects were in KL linguistic mode further came from the data of this study (pilot study) in which 5 subjects were asked to produce those words first in Lingala, and then in French. In the first mini experiment, subjects were asked to produce 11 words in Lingala. In the second experiment, they were asked to produce those same words in French. The results are produced in Table 3.

Note that italic emphasis refers to the epenthetic vowels as produced by KL speakers. The results in Table 3 show that when KL speakers are in KL linguistic mode, they usually insert an epenthetic vowel after a coda to break the illicit sequence of CC within a word, resulting in the re- syllabification of the coda into an onset. The same is observed with the final coda which is re- syllabified as an onset. In some cases, there is shift of the stress from the left to the right-hand side, to the adjacent juxtaposed syllable as in [ˈkalsɔ̃] →[kaˈlesɔ̃] and [ˈavjɔ̃] →[aˈvijɔ̃]. These results once more confirm that when the language to use by the participants is specified in the instruction and the interaction in a monolingual mode, the bilingual speakers switch to the linguistic mode of the target language and fully activate the base/matrix language, which in this case is KL. The differences


Table 3: Word transcription as produced by KL speakers in the pilot study

in their production between BF and KL were rather intriguing, as shown in Table 3. Fourth, I also used my intuition as a native speaker of KL to determine whether subjects were in KL linguistic mode or not. For instance, whenever a subject would insert an epenthetic vowel in a word to break the sequence of coda-onset, that was an intuitive indication that the subject was in KL linguistic mode, since in most cases KL does not allow a coda in a syllable. These linguistic realities are robust evidence that my subjects produced the target vowels in this study in KL.

#### **4.6 The results**

The results of the Generalized Estimating Equation (GEE) comparing the F1 and F2 values of the pairs of the adapted target vowels are presented in Table 4.


Table 4: The results for the comparisons of the target vowels

*<sup>a</sup>*Estimated marginal means

These results in Table 4 reflect the pairwise comparisons of estimated marginal means based on the original scale of dependent variables zF1 and zF2 for the pairs of target vowels. The results of the Wald chi-square that tested the simple effects of vowel within each level combination of the other factors shown which was based on the linearly independent pairwise comparisons among the estimated marginal means for F1 and F2 are presented in Table 4.

It should be noted that only the results of the F1 accounts for the raising of the target vowel in the adaptation process, which entails that merger or neutralization of two segments is accounted for by only the results of the F1. The F2 served to determine the degree of either frontness or backness of the vowel. It also helped to account for the unrounding of any target vowel.

### **5 Findings of BF vowel adaptation in KL**

The global findings of this experiment show that the adapted front and back midvowels /ɛ/ and /e/, and /ɔ/ and /o/ have been adapted as separate segments [ɛ] and [e], and /ɔ/ and /o/ respectively. These pairs of vowels have been adapted in different phonetic spaces. This is evidence of the existence of contrast between [+ATR] and [−ATR] in the phonological system of KL.

This adaptation process preserves the phonological features which are already licensed within a particular perimeter of the phonological/ phonetic space of KL. In clear, the [−round] which is licensed within the front perimeter of the phonological/ phonetic space of KL is preserved as well as the [+round] within the back phonetic space of KL.

The phonological adaptations of both [ɛ] and [e] as two distinct segments in KL provides an interesting evidence in favor of the existence of these segments as distinct phonological entities in the abstract representation of KL. If that were not the case, the contrast between these two segments would not have emerged. This findings on the non-merger of [ɛ] and [e] during their adaptation process in KL shows and challenges the claim in the existing literature that these two vowels are merged in KL.

In the back phonetic space of KL, the BF pair of vowels /ɔ/ and /o/ have been adapted as two different segments in the system. The survival of the phonological contrast between the pair of vowels /ɔ/ and /o/ offers interesting evidence to build the story against the claim in the existing literature that /ɔ/ and /o/ are merged in KL.

However, /œ/ has been adapted as both [e] and [ɛ]. This implies that /œ/ has been adapted within the loop of the intersection point that has been created through the process of contrast reduction of [e] and [ɛ]. It is assumed from this phonological adaptation that [ɛ] and [e] overlap at a certain point of their phonetic space to an intersection surface that is not wide enough for both sound to merge. However, the intersection point of their overlap is the space within which

[œ] is adapted. As a result, [œ] is adapted in their phonetic spaces scarifying its initial feature such as [+round]. Figure 1 illustrates the adaptation of [œ] into both [ɛ] and [e].

Figure 1: The adaptation of [œ] into both [ɛ] and [e]

Besides, /ø/ has been adapted in different phonetic spaces than [e] and [ɛ]. The non-merger of /ø/ with neither [ɛ] nor [e] entails that KL has adopted a new segment in its phonological/ phonetic system. This segment does not obey the phonological constrains of KL within the front parameter space of KL phonetic space. Such is the case of the combination of [+round] and [+front] with native data in KL.

It should be recalled that the adaptation process of [ø] into KL phonological/ phonetic space reveals some exceptional cases which do not obey the phonological patterns of the adaptation process of BF vowels in KL. In all the cases, the adaptation process of a BF vowel in KL prohibits the combination of [+round] with [+front] yet this combination is possible only when [ø] is adapted in KL. Second, in all the aforementioned cases, the combination of [+ATR] and [+round] in the front-parameter of KL phonetic space is constrains, yet with the adaptation of [ø] these constrains are violated. Could we consider [ø] as a transparent segment which does not obey any constrains during its phonological adaptation process in KL?

I posit that a transparent segment is the one which displays exceptional phonological behavior during its given phonological process resulting in a violation of the identified and attested constraints that regulate the phonological process of

#### Philothé Mwamba Kabasele

a target language. Such is therefore the case of the phonological adaptation of the BF [ø] in KL.

It should be also pointed out that the fine-grained phonetic traces of the BF are preserved somewhat and can still be traced back in the phonetic space of KL. These original fine-grained phonetic traces of the BF that are still identified in KL phonetic space are to be considered as the linguistic vestige (remnant) of this adaptation process. This linguistic vestige is the case of [+round] of [ø] which has been preserved during the adaptation process. The findings of this study are illustrated in Figure 2.

Figure 2: Global findings of BF vowel adaptation in KL in terms of segments

The findings in Figure 2 could be otherwise represented and discussed in terms of features. Unlike with the native data (automatically extracted data) which showed that KL does not accept [−ATR] in the back phonetic space of KL linguistic system, the findings of the adapted vowels show the opposite. In fact, these findings show both [+ATR] and [−ATR] are preferred in the system. However, the findings of the front phonetic space from native data disagree with the findings of the adapted vowels in KL. KL native data system attests the preference of both [+ATR] in front phonetic space of the system, while the Loan data system shows preference to both [+ATR] and [−ATR]. These realities are represented in Figures 3 and 4.

Figure 3: Findings of BF vowel adaptation in KL in terms of [+/−ATR] features

Figure 4: Findings of BF Vowel Adaptation in KL in Terms of [+/−ATR] and [+/−round] Features

### **6 Conclusion**

The global findings of loanword adaptation in KL help to build a story in either support or rejection of a number of concerns that were raised in this study. They help to provide further evidence in rejection of the claim that KL speakers do not make any differences between the mid-vowels [ɛ] and [e], and [ɔ] and [o].

In fact, the findings of this study have shown that KL speakers still discriminate between the pairs of vowels [ɛ] and [e], and [ɔ] and [o], which implies the existence of the underlying contrast between the features [+ATR] and [−ATR]. Such evidence could be used as a recoverability diagnostic evidence to the process of non-merger of /o/ and /ɔ/ in KL native data as claimed in the existing literature. By showing through this study that the process of BF mid-vowel adaptation displays contrast, these findings help to further argue for the existence of the contrast even at the underlying representation of KL.

Referring back to the research questions that were raised earlier in this study, it is shown that the phonological system of KL primarily dictates its linguistic preferences in the phonological adaptation process of the loanwords in KL. The evidence comes from the fact that most of the adapted vowels from the BF obey the phonological preferences of KL. The common bundles of features are faithfully preserved to accommodate the loan lexical item in the linguistic system of KL. The preservation of similar future is evidence that similarity plays an important role in the adaptation of the loanwords in KL. The more similar, the more preferred. When the foreign input does not offer any similarity with the phonological system of the recipient language within a particular phonetic space, the illicit feature is just sacrificed as in the case of the adaptation of [œ] into both [ɛ] and [e] in which the dispreferred feature [+round] in the [−back] phonetic space of KL was sacrificed. However, it is noted that only the transparent segment such as [ø] has violated such a constraint in that it has been adapted with its [+round] feature within the [−back] phonetic space of KL. Finally, it is shown that when a feature or feature combination in a foreign input vowel either presents similarities with a feature or feature combination in the recipient language phonological system, or else does not present any similarities to any feature or feature combination in the phonological system of the recipient language, the feature is preserved in case of similarity, but sacrificed in case of differences with the transparent segment once more violating this constraint.

## **References**


# **Chapter 4**

# **The augment in Haya and Ekegusii**

Jonathan Choti

Michigan State University

This article examines the behavior of the augment in Haya (E22) and Ekegusii (E42), two Bantu Zone E languages, revealing many similarities and a few differences between the Haya and Ekegusii augment. In both languages, morphosyntactic, semantic, and pragmatic requirements regulate the behavior of the augment. The common shape of the augment is a vowel (V, namely /a/, /e/, and /o/). Besides, Ekegusii has the CV shape in *ri-* and *chi-*6 of class 5 and 10, respectively. Augmented nouns in both languages are the default but are ambiguous between a specific and non-specific reading. In Haya and Ekegusii, the augment is not marked on proper names, most kinship terms, and vocative nouns because these pick out specific referents. Nouns used as adverbs of location, time, and manner omit the augment in both languages. The two languages require the augment in predicative and associative constructions. In complex nouns, both elements require the augment but in compound nouns, only the first is augmented in both languages. The two languages allow the augment in gerunds but not in infinitives. Most pronominals require the augment in the two languages. Haya and Ekegusii disallow the augment in interrogative and negative constructions, proverbs, and nouns modified by 'any' to signal non-specific reference. In both languages, affirmative declaratives require the augment. Emphatic nouns in topic and contrastive focus positions require the augment to mark emphasis and specificity even in negative contexts. These features of the augment in Haya and Ekegusii confirm that the so-called augment is actually a bound article.

## **1 Introduction**

In a number of Bantu languages, common nouns and other nominals contain a stem-initial prefix that precedes the class prefix. This prefix is commonly known as the augment (or initial vowel or preprefix). However, some Bantu languages

Jonathan Choti. 2022. The augment in Haya and Ekegusii. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 47–72. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo . 6393740

such as Swahili do not have the augment. De Blois (1970) presented a typological survey of the behavior of the augment in which he demonstrated its crosslinguistic variation and the semantic, syntactic, and other factors that regulate its behavior. He however concluded that his investigation was incomplete because of insufficient data. Subsequent studies on the augment have focused on its behavior in individual languages such as Dzamba (Bokamba 1971), Haya (Chagas 1977), Luganda (Ashton et al. 1987, Hyman & Katamba 1993, Ferrari-Bridgers 2009, Mould 1974), Kinande (Progovac 1993), Kagulu (Petzell 2003), IsiXhosa (Visser 2008), Kirundi (Ndayiragije et al. 2012), and Nata (Gambarage 2013, 2019). Some of these accounts maintained that the behavior of the augment is regulated by the semantics, i.e. definiteness and specificity (e.g. Bleek 1869, Bokamba 1971, Gambarage 2013, 2019, Givón 1969, Meeussen 1959, Mould 1974). Other studies argued that morphosyntactic requirements are the key determinants of its behavior (e.g. Dewees 1971, Hyman & Katamba 1993). The current study develops de Blois' typological account by comparing the behavior of the augment in Haya and Ekegusii of Bantu Zone E (Guthrie 1967–1971). The data in (1) and (2) illustrate the marking of the augment in the two languages:<sup>1</sup>

	- a. o-mu-ana '1-child'
	- b. a-ba-ana '2-children'
	- c. e-ki-imba '7-bean'
	- d. o-ru-limi '11-tongue'
	- a. o-mo-nto '1-person'
	- b. a-aba-nto '2-people'
	- c. o-mo-te '3-tree'
	- d. ri-i-timo '5-spear'

In (1), the Haya augment occurs as *o*- (1a, 1d), *a*- (1b), and *e*- (1c) and is immediately followed respectively by the class prefixes -*mu*- (1a), -*ba*- (1b), -*ki*- (1c) and -*ru*- (1d). In (2), the Ekegusii augment is realized as *o*- (2a, 2c), *a*- (2b), and *ri*- (2d) before the class prefixes -*mo*- (2a, 2c), -*ba*- (2b), and -*i*- (2d), respectively. The third element in the examples is the nominal root.

<sup>1</sup>Any undocumented Haya data in this study stems from from the Yoza dialect and was provided by Abdul Mutashobya. The Ekegusii data was provided by the author who is a native speaker of the language. The author is thankful to Abdul Mutashobya for the Haya data.

<sup>2</sup>The numerals in the glosses indicate the noun class of the noun.

The goal of this article is three-fold. The first is to compare and contrast the behavior of the augment in Haya and Ekegusii. The second is to determine the semantic, pragmatic, and morphosyntactic properties of the augment in both languages. The third is to show that the behavior of the augment is consistent with that of articles in other languages. The rest of this article proceeds as follows. A review of previous accounts of the augment is presented in §2, formal properties of the Haya and Ekegusii augments in §3, and their grammatical properties in §4. The semantic/pragmatic account of the augment in Haya and Ekegusii is presented in §5, the augment's article properties in §6, and the summary and conclusion in §7.

## **2 Review of previous accounts of the augment**

In the literature, three aspects of the augment appear to be prominent: Semantic, morphosyntactic, and typological properties. This review focuses on the aforementioned characteristics of the augment that are presented in three different subsections.

#### **2.1 Semantic properties of the augment**

The primary characteristic of the augment is that it occurs in some contexts but not in others. Some of the earlier accounts of the augment maintain that it's (non)-occurrence is determined by the contrast between (in)definiteness and/or (non-)specificity, as in Bemba (Givón 1969), Luganda (Ashton et al. 1987, Ferrari-Bridgers 2009, Mould 1974), Dzamba (Bokamba 1971), Kagulu (Petzell 2003), Kinande (Progovac 1993), Xhosa (Visser 2008), and Nata (Gambarage 2013, 2019). In Dzamba, Bokamba (1971: 220) concluded that the behavior of the augment is determined by the semantics, describing it as a referentiality and definiteness marker. The Dzamba examples in (3) demonstrate that the absence of the augment in *moibi* 'thief' (3a) and its occurrence in *omoibi* 'the thief' (3b) makes a semantic distinction.

(3) a. mo-ibi (mɔɔ) anyɔlɔki ondaku 'A thief entered the house.' b. o-mo-ibi (\*mɔɔ) anyɔlɔki ondaku

'The thief entered the house.'

Moreover, the Dzamba augment is obligatory in topicalized NPs and NPs modified by relative clauses and adjectives. Bokamba concluded that in these contexts, the augment marks specificity.

#### **2.2 Morphosyntactic properties of the augment**

Some of the previous studies of the augment analyzed it as part of inflectional morphology and attributed its behavior to syntactic or a combination of syntactic and morphological requirements (e.g. Dewees 1971, Hyman & Katamba 1993, Mould 1974). Hyman & Katamba (1993: 224) showed that in Luganda, two operators of negation and focus license bare nouns (those without the augment) while nouns with the augment are self-licensing, as in the examples in (4):

	- a. tè-bááwà NEG-they báànà gave bìtábó children books 'They didn't give children books.' (\*à-báànà è-bìtábó, \*à-báànà bìtábó, \*báànà è-bìtábó)
	- b. yàgúlà bìtábó (bìnó) 'He bought (these) books.' [Postverbal focus]

In (4a), the nouns *báànà* 'children' and *bìtábó* 'books' appear without the augment because they occur within the scope of negation. In (4b), the noun *bìtábó* 'books' appears as a bare noun in a postverbal focus position where it may cooccur with the demonstrative *bìnó* 'these'. The scopal relations between the bare NPs and the operators led Hyman & Katamba (1993) to attribute the absence of the Luganda augment to the syntax. They further argued that the Luganda augment cannot have any semantic correlates because it is an inflectional category with all the properties of inflectional morphology such as those proposed by Anderson (1988).<sup>3</sup> Hyman & Katamba (1993) claim that only syntax (and not semantics or pragmatics) conditions the augment in Luganda is too strong. Besides, the fact that an augment is an inflectional category does not mean that it has no semantic or pragmatic correlates. Furthermore, negation and focus are semantic principles as well. Moreover, other studies such as Ashton et al. (1987: 30), Mould (1974), and Ferrari-Bridgers (2009) determined a semantic factor in the behavior of the Luganda augment. In addition, (non-)specificity in opaque contexts such as negative and interrogative construction involve scopal relations that draw on syntactic and semantic principles (e.g. Abusch 1993, Lyons 1999, Winter 1997). The current study reconciles the morphosyntactic and semantic accounts of the augment and shows that the properties of the augment range from morphological, phonological, syntactic, and semantic to pragmatic ones.

<sup>3</sup>The relevant properties of inflectional morphology include: (a) configurational properties (i.e., sensitivity to syntactic configurations), (b) agreement properties, (c) inherent properties, and (d) phrasal properties (Anderson 1988).

#### **2.3 Typological properties of the augment**

De Blois (1970) categorized augment languages into three classes based on the factors that determine the behavior of the augment in the language. These factors include formal grammatical conditions, definable semantic function, and special functions. He attributed the absence of the augment in locatives, vocatives, compound nouns, kinship terms, proper names, predicative constructions, etc. to formal grammatical conditions. In languages where the augment has a definable function, it appears in nouns with determinate or particularized referents, but does not occur in other conditions. Languages in which augment has special functions include Tswa, Ronga, Thonga-Shangaan, and Tonga. In these languages, it was observed that faster speakers used the augment while slower speakers did not. The data in (5–7) show de Blois' categories of augment languages:

	- a. Nyakyusa: nsoko 'your mother'
	- b. Rundi: nyookûru 'my grandmother'
	- c. Nande: tatâ 'my father'
	- d. Ganda: ssebo 'sir'

#### (6) Semantic function


#### (7) Special functions

Tonga: bazyala ciindi comwe … 'they simultaneously produced …' i-bana bakozyanya 'children who look like each other'

In (5), the kinship terms *nsoko, nyookûru, tatâ*, and the title *ssebo* occur without the augment. In (6a), the augment *a-* in *amaguta* 'the oil' marks definiteness while its absence in *maguta* 'some oil' (6b) signals indefiniteness. In (7), *i-bana* 'children' occurs with the augment *i*- after a pause. I noted earlier that both semantics and syntax have a bearing on the behavior of the augment in Luganda. Thus, de Blois' categorization of augment languages into three distinct classes based on its function is questionable. The rest of this article shows that the behavior of the augment may vary intra- and cross-linguistically due to syntactic, semantic, and pragmatic requirements.

## **3 Formal properties of the augment in Haya and Ekegusii**

This section presents basic facts about the shapes of the augment in Haya and Ekegusii, respectively. In Haya, the augment occurs consistently as a vowel (V), which is also the most common shape of the Ekegusii augment. The other shape of the augment in Ekegusii is a consonant-vowel sequence (CV) that occurs in class 5 and 10. The Ekegusii class 5 augment has two allomorphs, V and CV. In both languages, the augment is not realized in class 1b nouns (kinship terms) and class 21 nouns (proper names). The other similarity between the Haya and Ekegusii augment is that the V shape involves the three non-high vowels /a/, /e/, and /o/. However, the vowel of the CV augment in Ekegusii is /i/. The data in Table 1 show the respective shapes of the augments in Haya (Chagas 1977: 35) and Ekegusii (adapted from Cammenga 2002: 199).


Table 1: Augments in Haya and Ekegusii

In Table 1a, the Haya augment vowel exhibits harmony with the prefix vowel. The augment appears as the back vowel /o/ when the prefix vowel is the back vowel /u/, as in class 1, 3, 11, 13, 14, 15, 17, and 18. The Haya augment vowel occurs as the front vowel /e/ whenever the prefix vowel is the front vowel /i/, as in class 4, 5, 7, 8, 9, and 10. There is perfect harmony in the Haya augment and prefix vowel involving /a/ in class 2, 6, 12, and 16. In Table 1b, the Ekegusii augment vowel exhibits perfect harmony with the prefix vowel in most of the cases as well. For example, it appears as /o/ whenever the prefix vowel is /o/, as in class 1, 3, 11, 14, and 15. The Ekegusii augment vowel appears as /a/ before class prefixes with the vowel /a/, as in class 2 and 6. The Ekegusii augment /e/ occurs before prefixes with front vowels /e/ and /i/, as in class 4, 5, 7, 8, and 13. The Ekegusii vowel in the CV augment of class 5 appears as /i/ before the prefix vowel /i/<sup>4</sup> . It is possible to relate the Ekegusii augment vowels /e/ and /i/ of class 9 and 10 to Proto-Bantu /e-Ni-/ (Chagas 1977: 35). Apparently, Haya and Ekegusii exhibit many similarities in the formal properties of their augments despite the CV shape observed in Ekegusii. Besides, Haya locative classes 16, 17, and 18 have an augment but the single locative class in Ekegusii (i.e. class 16) does not, and thus is not included in Table 1b. Table 2 summarizes the formal shapes of the augment in Haya and Ekegusii.


Table 2: Shapes of the augment in Haya and Ekegusii

<sup>4</sup>A reviewer suggested that the Ekegusii class 5 allomorph ri- of the augment might be due to a metathesis rule /i-ri-/ → [ri-i-] to get rid of word-initial /i/. The lowering of Proto-Bantu initial /i/ to [e] created the other allomorph /e/.

## **4 Grammatical properties of the augment in Haya and Ekegusii**

This section addresses the behavior of the augment in Haya and Ekegusii across different morphosyntactic environments especially those identified in de Blois (1970) and elsewhere. The relevant contexts include proper names and kinship terms, complex nouns, compound nouns, predicative constructions, adverbial nouns, verbal nouns (gerunds and infinitives), associative constructions, nouns modified by determiners such as 'other' and 'every', adjectives, relative clauses, and vocatives. Morphosyntactic contexts that are ambiguous between syntax and semantics appear in §5 and include nouns modified by 'any', and nouns in negative and interrogative constructions. This section is organized into six subsections.

#### **4.1 Augment in proper names and kinship terms**

Kinship terminology refers to "the system of names applied to categories of kin standing in relationship to one another" (The Editors of Encyclopaedia Britannica 2017). This definition implies that kinship terms, similar to proper names, pick out specific referents because these terms denote categories of kin standing in relationship with one another. Well-known examples of kinship terminology include 'father', 'mother', 'sister', 'brother', 'wife', and 'husband'. In both Haya and Ekegusii, many kinship terms do not take the augment. Additionally, the augment is not marked on proper names even those derived from common nouns that take the augment. The examples in (8a–8c) illustrate these facts.

#### (8) Augment in proper names and kinship terms



In (8a), the Haya augment is marked on common nouns *entale* 'lion', *emilembe* 'blessings', *oburungi* 'beauty', and *omukama* 'king'. However, the proper names derived from these common nouns omit the augment. They include *Ntale*, *Milembe*, *Burungi*, and *Mukama*. In Ekegusii, proper names *Sese*, *Kerandi*, *Sigara*, and *Nyanchera* occur without the augment but their corresponding common nouns do, i.e. *e-sese* 'dog', *e-kerandi* 'gourd', *e-sigara* 'cigarette', and *e-nchera* 'path'. In (8b), Haya kinship terms that occur without the augment include *mae* 'mother', *tata* 'father', *isho* 'your father', *nyoko* 'your mother', *mae enkuru* 'my grandma, and *tata enkuru* 'grandpa'. Ekegusii examples include *tata* 'my father', *baba*, *mama* 'my mother', *iso* 'your father', *nyoko* 'your mother', *magokoro* 'grandma, and *sokoro* 'grandpa'. The data in (8c) show that some kinship terms do take the augment in both Haya and Ekegusii. Note that these kinship terms are modified by possessive pronouns to identify the two kins involved. Thus, the grammars of the two languages dictate that proper names and most kinship terms occur without the augment and augmented kinship terms take a possessive modifier. Moreover, the absence of the augment in these nouns is also conditioned by the semantics since kinship terms and proper names have identifiable referents. Hence, the absence of the augment in such nouns signals their referential role. Longobardi (1994) showed a similar situation regarding the behavior of Italian articles in proper names. I return to the semantic and pragmatic properties of the augment in §5.

#### **4.2 Augment in complex and compound nouns**

This subsection explores the behavior of the augment in complex and compound nouns in Haya and Ekegusii. The relevant complex nouns are those denoting the idea 'having/possessing' while compound nouns comprise an agentive noun and its object or complement. Haya uses the phrase 'owner(s) of x' for a complex noun denoting the idea 'having/possessing'. On the other hand, Ekegusii expresses

#### Jonathan Choti

the same idea with the structure 'owner(s) x'. Nonetheless, in both languages the augment is required on both elements of the complex noun. The data in (9) illustrate the marking of the augment in Haya and Ekegusii complex nouns:

	- a. Haya: o-mukama w'e-nju 'house-owner'
	- b. Haya: o-mukama w'e-mbwa 'dog-owner'
	- c. Haya: o-mukama w'eichumu 'spear-owner'
	- d. Ekegusii: o-monyene e-nyomba 'house-owner'
	- e. Ekegusii: o-monyene e-sese 'dog-owner'
	- f. Ekegusii: o-monyene ri-itimo 'spear-owner'

The apostrophe (') in (9a–9c) indicates vowel coalescence (or vowel deletion) that occurs across a morpheme boundary as a hiatus resolution strategy in Haya and other Bantu languages. The compound nouns examined consist of an agentive deverbal noun and its common noun complement. In these structures, the first element of the compound (i.e. agentive noun) takes the augment but the second element loses the augment in both Haya and Ekegusii as seen in (10):

(10) Augment in compound nouns


f. Ekegusii: o-mokami mbori (\*chi-mbori = goats) 'goat-milker'

#### **4.3 Augment in predicative and associative constructions**

A predicative construction is a noun or adjective that follows a linking verb and provides information about the subject of the sentence (Aarts 2011). In Bantu languages, the associative construction refers to the structure 'of + noun/possessive pronoun' that functions to express possession or association between two nouns, one being the head noun and the other a modifier in a prepositional phrase. The modifier noun acts as a complement of the preposition 'of'. In this subsection, we examine the behavior of the augment in nouns that follow linking verbs and 'of' in predicative and associative constructions, respectively. The data in (11) illustrate the marking of the augment in Haya and Ekegusii predicative constructions: (11) Augment in predicative constructions


In (11), the linking verb 'to be (is, are)' occurs as *n* + vowel, but this vowel is deleted or coalesces with that of the augment whereas in slow speech, this vowel is realized as a replica of the augment vowel. In the literature, the augment in this environment is called a *latent augment* (e.g. de Blois 1970). The data in (11) show that the augment occurs in predicative constructions in both languages.

The data in (12) show that the augment in Haya and Ekegusii is also retained in associative constructions. Both the head noun (first noun) and the modifier (second noun and complement of the preposition) must take the augment. In the second noun, it is realized as a latent augment in Ekegusii (this is also the case in Haya but in faster speech):

#### (12) Augment in associative constructions


#### **4.4 Augment in adverbial nouns**

In the literature, adverbial nouns are nominals that normally function grammatically as adverbs to modify verbs. Normally, such nouns provide a range of information including location, time, and manner. This subsection deals with the behavior of the augment in these nominals. First, I will consider locative nouns. The data in (13) demonstrate the behavior of the augment in locative nouns in Haya and Ekegusii, respectively:

(13) Absence of augment in locatives


In (13), Haya and Ekegusii locative nouns drop the augment, but Haya substitutes the augment with locative prefixes that express such meanings as 'in', 'on', and 'at'. There are no such locative prefixes in Ekegusii locative nouns.

As for the grammatical function of expressing time or temporal information, the data in (14) show that both Haya and Ekegusii adverbial nouns of time drop the augment as well. Note that Haya examples have adverbial prefixes that are lacking in the Ekegusii data:

(14) Absence of augment in temporal nouns


In Haya and Ekegusii, adverbial nouns that express manner omit the augment. Unlike locative and temporal nouns, nouns that denote manner in Haya do not have adverbial prefixes. Consider the data in (15):

(15) Absence of augment in adverbial nouns of manner


f. Ekegusii: bokong'u (o-bokong'u = difficulty) 'firmly, hard'

#### **4.5 Augment in deverbal nouns (gerunds and infinitives)**

Deverbal nouns are nouns derived from verbs, for example, gerunds and infinitives. In the literature, a gerund is a noun derived from a verb that retains some verb-like properties such as taking a direct object and adverbial modifiers. The infinitive is the form of the verb with *to*, or its equivalent, in front of the verb or prefixed to the verb. In Haya and Ekegusii, the equivalent of *to* is the prefix *ku*and *ko-/go*-, respectively. The gerund in the two languages takes the augment /o/ before the infinitive prefix *ku*-. Thus, in Haya and Ekegusii grammars, the augment occurs in gerunds but not in infinitives. Both gerunds and infinitives in the two languages occupy grammatical roles of nouns such as subject and object in a sentence. The gerunds in (16) take the augment in subject position but the infinitives in (17) occupying the same position do not:

(16) Augment in subject gerunds



#### (17) Absence of augment in subject infinitives


f. Ekegusii: gosoma n'eriogi ngwancheire 'to read aloud is acceptable'

The data in (16–17) indicate that the augment is obligatory in gerunds but dropped in infinitives. I appears that the behavior of the augment distinguishes between gerunds and infinitives in Haya and Ekegusii. It is possible to argue that the variation of the augment in (16) vs. (17) is due to the syntactic position of subject. However, the data in (18–19) reveal the same pattern when gerunds and infinitives occur in object position. The augment is obligatory in object gerunds and dropped in infinitives parallel to the forms in (16–17) involving subject position.

(18) Augment is object gerunds


The data sets in (16–19) affirm that Haya and Ekegusii gerunds take the augment while infinitives do not. In addition, gerunds take determiners such as possessive pronouns (18a–18b, 18d–18e) while infinitives take adverbial modifiers such as *burikilo* 'daily' (19a) and *botambe* 'always' (19d). This means that gerunds exhibit more noun-line properties while infinitives exhibit more verb-like properties. Yet, both deverbal nouns may occupy subject and object positions.

#### **4.6 Noun+modifier 'one', 'two', 'each', 'every', 'other'/'another', 'all'**

The behavior of the augment in nouns and/or their determiner or modifiers may also reveal some of its typological properties (de Blois 1970). I examine the behavior of the Haya and Ekegusii augments in nouns and modifiers such as numerals 'one' and 'two', 'each' or 'every', '(an)other', 'all', 'many', 'few', and 'whole'. The examples in (20) illustrate the status of the augment in these forms in both Haya and Ekegusii:

(20) Augment in noun+modifier constructions



The forms in (20) show four similarities and five differences in the behavior of the Haya augment and Ekegusii augment. In both, the augment is marked only on the head noun and not on the modifiers 'two' (20b), 'all' (20f), 'few' (20h), and 'whole' (20i). However, in the noun+'one' (20a) and noun+'many' (20g) NPs, the Ekegusii augment appears on both the noun and the modifier whereas the Haya augment occurs only on the noun. Additionally, in (20c–20d), the Haya augment occurs on both the noun and '(an)other' while the Ekegusii augment occurs only on the noun. In (20e), the noun and 'every' omit the augment in Haya while only 'every' omits it in Ekegusii.

The other noun+modifier constructions pertinent to the analysis of the augment involve adjectives and relative clauses for which examples are given in (21) show the behavior of the augment in these contexts.

#### (21) Augment in noun+adjective, noun+relative clause constructions


d. e-bikombe e-bili enja e-bikombe bire isiko 'the cups that are outside'

In (21), the augment is compulsory on the noun and adjective in both languages (21a–21b). However in noun+relative clause NPs (21c–21d), the Ekegusii augment is marked only on the noun but not on the relative clause while the Haya augment appears on both the noun and the relative clause.

In Haya and Ekegusii, modifiers or determiners may function as pronouns, occuring in an NP without the head noun. Thus, the modifier acts as the head of the noun phrase in the absence of the noun. These forms are also significant in understanding the behavior of the augment. The data in (22) show the behavior of the augment in Haya and Ekegusii pronominals:

(22) Augment in pronominals


In (22), Haya and Ekegusii exhibit the same pattern. In both languages, 'all' (22a), 'two' (22g), and 'any' (22h) do not take the augment while the rest of the pronominals do. It is noteworthy that Ekegusii 'an/other' and the relative clause do not take the augment in the presence of the head noun in (20c–20d) and (21d) but as pronominals they do as in *e-bire isiko* (22e) and *e-binde* 'other' (22f). In (22), the augment signals specificity since pronominals have identifiable antecedents. However, some nominals occur without the augment in (22) due to idiosyncratic properties.

## **5 Semantic and pragmatic properties of the augment in Haya and Ekegusii**

I noted earlier that the retention vs. deletion of the augment in other Bantu languages correlates with definiteness/specificity vs. indefiniteness/non-specificity, as in Dzamba (e.g. Bokamba 1971). The relevant contexts include negative constructions, questions, noun+'any' constructions, vocatives, emphatic nouns, focalized nouns, and proverbs. Some of these contexts create clear non-specific readings while others create both non-specific and specific interpretation, depending on the context (i.e. pragmatics). A specific interpretation obtains when a noun denotes a particular referent and a non-specific reference when the noun refers to a general class (Lyons 1999: §4). The two kinds of readings are possible in both transparent and opaque contexts. In transparent contexts, ambiguity between a specific and non-specific reading does not involve scope relations. Conversely, opaque contexts involve scope relations created by operators such as negation, questions, verbs of propositional attitude (e.g. want, believe, hope, intend), conditionals, modals, and future tense (Lyons 1999: 166–78). This investigation includes negation and interrogation as opaque contexts. This next subsection focuses on these contexts.

#### **5.1 Augment in affirmative vs. negative constructions**

Negation is one of the operators that create opaque contexts, i.e. contexts in which a specific and non-specific interpretation are possible (Lyons 1999). To determine the behavior of the augment in negative constructions, we must also examine its behavior in affirmative contexts as well. In Haya and Ekegusii, the augment is obligatory in affirmative constructions (23a–23c) but absent in negative constructions (23d–23f). The data in (23) illustrate this variation of the augment:

#### (23) Augment in affirmative vs. negative constructions


In affirmative constructions (23a–23c), the augment is required in Haya nouns such as *e-kikombe* 'cup' and *o-muyo* 'knife' and Ekegusii nouns *e-gekombe* 'cup' and *o-moyio* 'knife' take the augment. The same nouns drop the augment in corresponding negative constructions (23d–23f). The Haya and Ekegusii facts in (23d–23f) align with the Kinande data that led Progovac (1993) to conclude that bare nouns in Kinande behave as negative polarity items (NPIs). Beyond the NPI view, the bare forms in (23) receive a non-specific interpretation in that they describe a class of entities as opposed to specific entities while the augmented forms are ambiguous between a non-specific and specific interpretation. Disambiguation of the augmented forms will depend on the context and are thus subject to the principles of pragmatics. For example, if Ekegusii speaker A asks B, "Which one between the cup and the knife do you want?" B may reply, "It's the cup" (*n'egekombe*). In this context, *e-gekombe* 'the cup' refers to a particular cup, identifiable to both A and B. However, if A is in a different room, hears an object fall in the kitchen where B is and asks B, "What fell?" B may respond by saying, "It's a cup" (*n'egekombe*). In this context, B describes the type of object that fell but not a specific cup identifiable to both speakers. Therefore, augmented forms in Haya and Ekegusii may describe types of entities or pick out particular ones. In §5.3 below, I show that the augment is required in emphatic nouns occurring in negative constructions because they refer to specific referents and occupy syntactically salient positions such as topic and contrastive focus.

#### **5.2 Augment in interrogatives vs. declaratives**

Besides exploring the behavior of the augment in interrogative constructions, I will also examine its behavior in declarative constructions. The data in (24) demonstrate that the augment is omitted in Haya and Ekegusii interrogatives, respectively:

#### (24) Absence of the augment interrogatives


In (24), the interrogative morpheme is *ki* 'which' in both languages. Haya nouns *mbuzi* 'goat', *kikombe* 'cup', *mwana* 'child', and *musigazi* 'boy' drop the augment in interrogative constructions similar to the Ekegusii nouns *mbori* 'goat', *gekombe* 'cup', and *momura* 'boy.' Speakers use the questions in (24) to seek information on the identity of the nouns' referents. Thus, the referents of the nonaugmented nouns are not known by the speaker, i.e. are non-specific. I showed in (23) that augmented forms are ambiguous between a specific and non-specific reading. The forms in (25) show that the augment is compulsory in positive declaratives.

#### (25) Augment in positive declaratives



In the literature, demonstratives and possessives are analyzed as characteristically definite (e.g. Lyons 1999: §3). Hence, the use of 'this' and 'my' in (25) implies that the augmented nouns are definite and refer to specific referents. Consequently, the two kinds of determiners help disambiguate the augmented forms. In (25), the augment combines with the demonstrative 'this' and possessive 'my' to mark definiteness or specificity in Haya and Ekegusii. In this context, definite NPs refer to specific referents.

#### **5.3 Augment in emphatic nouns**

Constituents of a sentence or utterance that occupy topic and contrastive focus positions are treated as emphatic in the literature. Therefore, these elements receive linguistic prominence of different kinds depending on the language (e.g. Gundel & Fretheim 2004). The general view is that *topic* is *given information* that ranks higher in the referentiality or specificity scale. The same is true for constituents in the contrastive focus position. Contrastive focus refers to material that the speaker calls to the hearer's attention and that normally stands in contrast with other entities that might fill the same position (Gundel & Fretheim 2004: 181). Therefore, constituents in topic and contrastive focus positions receive linguistic and referential emphasis. Zimmermann et al. (2008) explains that speakers "use additional grammatical marking, e.g., intonation contour, syntactic movement, clefts, or morphological markers to signal contrastive focus." In addition, this special marking corresponds with emphatic marking in descriptive and typological accounts in some languages. Topic and contrastive focus are relevant to the analysis of the behavior of the augment. I showed earlier that the augment in Haya and Ekegusii undergoes deletion in negative constructions. However, nouns in topic and contrastive focus positions retain the augment in the context of negation. I posit that the two languages use the augment as an emphatic marker and argue that emphatic nouns express specific reference. The data in (26) illustrate this pattern:

(26) Augment in topic and contrastive focus NPs



In (26a–26c), the nouns for 'goat', 'cup', and 'child' function as discourse topics and for this reason must retain the augment in the context of negation. These nouns are also left dislocated to show that they are in topic position. Examples (26d–26e) illustrate the retention of the augment in contrastive focus positions in spite of negation. In (26d), the nouns for 'dog' are in contrastive focus with something else not included in the discourse. In (26e), the nouns for 'chicken' and 'dog' are in contrastive focus positions. The data in (26) confirm that Haya and Ekegusii use the augment as a morphological marker of topic and contrastive focus. In these contexts, the augment encodes emphasis in the relevant constituents. Besides morphological and syntactic marking, these emphatic nouns receive phonological prominence in the two languages (this point is not explored).

#### **5.4 Augment in vocatives**

Vocatives refer to "phrases used in direct address" (Lyons 1999: 152). Proper nouns denoting persons, kinship terms, and second person pronouns typically function as vocatives. Some accounts treat vocative as grammatical case and many languages have special vocative forms. Lyons explained that there is a great tendency for vocatives to be bare or exhibit morphological minimality. This tendency appears to obtain in Haya and Ekegusii where common nouns in vocative case drop the augment. The data in (27) illustrate this behavior in the augment:

#### (27) Absence of augment in vocatives


In (27), the Haya nouns *mwana* 'child', *baana* 'children', *musigazi* 'boy', and *mwisiki* 'girl' and the Ekegusii mwana 'child', *baana* 'children', *momura* 'boy', and *moiseke* 'girl' lose the augment in the vocative function. Given that referents of vocative phrases are contextually identifiable, the logical conclusion is that the augment is not needed in this context to signal specificity. The behavior of the augment in vocatives resembles its behavior in proper names and kinship terms derived from common nouns (see §4.1).

#### **5.5 Augment in noun+'any' constructions**

The determiner or pronoun 'any' is used to refer to one or some of a thing or number of things and to express a lack of restriction in selecting one of a specified class. This means that nouns modified by 'any' receive non-specific interpretation. In English, *any* is also used in questions (e.g., *Do you have any money?*) and negative constructions (e.g., *I don't have any money*). Both contexts imply nonspecificity. I also showed that negative (§5.1) and interrogative (§5.2) constructions disallow the augment in Haya and Ekegusii on the same grounds. The nonspecific interpretation inherent in 'any' makes it significant from the perspective of semantics. The augment in Haya and Ekegusii behave alike in noun+'any' phrases. In both languages, the nouns modified by 'any' lose the augment, as in the examples in (28):

#### (28) Augment in noun+'any' constructions


In (28), the Haya nouns *e-mbuzi* 'goat', *e-kikombe* 'cup', *o-mwana* 'child', and *omugeni* 'guest' drop the augment similar to their Ekegusii counterparts *e-mbori* 'goat', *e-gekombe* 'cup', *o-mwana* 'child', *o-mogeni* 'guest', *ri-itimo* 'spear' and *enyomba* 'house'. The omission of the augment suggests that the bare nouns are non-specific in their reference, which is reinforced by 'any'. Note that in both languages, 'any' occurs after the head noun and takes agreement prefixes. The data in (28) typify a kind of non-specificity agreement between the bare noun and 'any'.

#### **5.6 Augment in proverbs**

The inclusion of proverbs in this study stems from the fact that nouns used in proverbs do not have specific referents. Instead, such nouns denote a class of entities. In both Haya and Ekegusii, the augment is absent in nouns used in proverbs. Consider the data in (29):

(29) Absence of augment in proverbs


d. Ekegusii: mwana obande mmamiria makendu 'someone's child is cold mucus'

In (29a–29b), the Haya nouns *njubu* 'hippo', *bwato* 'boat', *balezi* 'baby sitters', and *mwana* 'child' occur without the augment in the two proverbs. Similarly, Ekegusii nominals *mominchoria imi* 'one who braves dew', *mosera ibu* 'one who stirs ash', *mwana* 'child', *mamiria* 'mucus', and *makendu* 'cold' omit the augment. The bare nouns in (29) do not have specific referents. Therefore, the data in (29) provide additional evidence that the absence of the augment reflects the nonspecific interpretations in particular contexts.

## **6 Augment as an article**

The previous sections have shown that the behavior of the augment in Haya and Ekegusii is consistent with articles in other languages. There is no evidence in the Bantu literature that negates this observation. Therefore, the terms *augment*, *initial vowel*, and *preprefix* used to describe this formative are a misnomer and thus misleading. In this section, I explain some of the key properties of articles evident in the augment. In reference to the two English articles *the* and *a*, Lyons (1999: 36) defines an article thus:

(30) Definition of article

The basic unmarked nature of *the* and *a*, with their minimal semantic content [+Def] and [+Sg] respectively, reflected in their phonological weakness and default behavior, I shall take to be what defines the term article.

In (30), the features [+Def] and [+Sg] represent respectively *definite* and *singular* and thus characterize English *the* and *a* as definite and cardinality articles. Lyons's definition also identifies phonological weakness as a core property of articles. This property means that phonologically, articles are dependent on adjacent elements and are mostly monosyllabic (Giusti 1997). Besides, articles are also morphologically dependent or bound (i.e., clitics, affixes), form closed classes, inflect for number, gender and case, occur in NPs, and correlate with (in)definiteness and/or (non-)specificity (Giusti 1997). The augment reveals these traits in Haya and Ekegusii.

I highlight five properties of articles evident in the Haya and Ekegusii augments. First, in both languages, the augment occurs as a monosyllabic prefix whose phonological shape depends on the vowel of the class prefix. This underlines the fact that the augment is phonologically weak and dependent on the adjacent host similar to articles in other languages. Cross-linguistic evidence shows that articles are prone to phonological reduction processes, with the article and the host forming word-like units that function as a full lexical form. Second, the augment in Haya and Ekegusii occurs as a bound morpheme, not an infrequent quality in articles across languages. Lyons (1999: 63) explains that articles may exist as either independent words (e.g. English the) or bound morphemes. Bound articles may occur as clitics (e.g. Spanish *el* in *el hombre* 'the man') or affixes (e.g. Romanian *-ul* in *om-ul* 'man-the'). Therefore, the augment shares the property of bound morphemes with articles of languages such as Spanish, Romanian, Arabic, and Hausa (Lyons 1999). Therefore, the augment is best treated as a bound article. These facts reveal the flaw in using the imprecise terms augment, preprefix, and initial vowel to describe this formative, making it look like a foreign element that has no equivalents in other human languages.

The third characteristic of articles observed in the behavior of the augment is that it occur as a single formative with variants that constitute a closed class in both Haya and Ekegusii. This is a common property of functional categories such as determiners. In Haya, the augment has three variants /a/, /e/, and /o/ and in Ekegusii five variants /a/, /e/, /o/, /ri/, and /chi/. The variants of the augment exhibit identical behavior across various contexts similar to variants of articles in other languages such as a and an of the English indefinite, cardinality article. The limited number of variants of the Haya and Ekegusii augments parallels that of articles in languages that have them. While some languages have no articles (e.g. Swahili, Latin, and most Slavic languages), other languages have one article (usually the definite article, e.g. Bulgarian and Modern Greek) and yet others have two (e.g. English) (Giusti 1997, Lyons 1999). The fourth property of articles found on the augment is that it inflects for number, gender, and case; the

augment in Haya and Ekegusii alternates to indicate singular/plural distinctions, noun class (or gender), and case (e.g. locative and vocative). The fifth attribute of articles apparent on the augment is its association with (non-)specificity and (in)definiteness. The augment is absent in proper names, kinship terms, and vocatives because these nouns are definite and specific as they pick out specific referents. Elsewhere, the omission of the augment signals a non-specific reading, as in negative constructions, questions, proverbs, and nouns modified by 'any'. The augment is compulsory in emphatic nouns in topic and contrastive focus positions to mark specificity. In neutral or transparent contexts, the marking of the augment is ambiguous between a specific and non-specific reading but contextual variables help disambiguate the NPs in question.

## **7 Summary and conclusion**

This article has shown many similarities and a few differences between the Haya and Ekegusii augments, determined the grammatical, semantic, and pragmatic properties of the augment, and highlighted the properties of article found in the augment. The common shapes of the augment are vowels (V) /a/, /e/, and /o/ in both languages though Ekegusii also has the CV shape in /ri/ and /chi/. In both languages, the augment is not marked on proper names, most kinship terms, and vocative nouns because these are definite and specific in reference. Augmented nouns in both languages are ambiguous between a specific and non-specific reading in transparent contexts. Adverbial nouns of location, time, and manner omit the augment in both languages. They also require the augment in predicative and associative constructions. In complex nouns, both elements of the NP take the augment in Haya and Ekegusii. However, in compound nouns, only the first element is augmented in both languages. Haya and Ekegusii allow the augment in gerunds but not in infinitives. The languages exhibit some similarities and differences in the marking of the augment in the head noun and its modifiers. Most pronominals require the augment in the two languages, but omit it in interrogative and negative constructions to signal non-specific reference. In both languages, affirmative statements require the augment but the meaning of the noun varies between a specific and non-specific interpretation. Emphatic nouns in topic and contrastive focus positions require the augment to mark emphasis and specificity. Nouns in proverbs and those modified by 'any' drop the augment to express non-specific reference inherent in these contexts.

The findings from Haya and Ekegusii confirm that the augment is indeed a bound article. The augment in these languages exhibits five properties found in articles in other languages. First, it occurs as a monosyllabic prefix whose phonological shape depends on the vowel of the class prefix. Second, as a prefix, the augment is morphologically dependent on its host. Third, the various shapes of the augment in Haya and Ekegusii constitute a closed class. Fourth, both languages use the augment to mark the grammatical properties of number, gender, and case. Lastly, the augment interacts with semantic and pragmatic principles to express respectively definite and (non-)specific interpretations.

## **References**


# **Chapter 5**

# **Learning Swahili morphology**

## John Goldsmith & Fidèle Mpiranya

University of Chicago

We describe the results of automatic morphological analysis of a large corpus of Swahili text, the Helsinki corpus, using Linguistica, an unsupervised learner of morphology. The result is a fine-grained analysis, with some results corresponding to the familiar linguistic analysis, and with others that are possible only with exact quantitative measures available with computational analysis. The prefixal inflectional morphology is largely done well, while the suffixal morphology is successfully analyzed in some cases and not in others.

### **1 Introduction**

In this paper we would like to explain some of the things that we have learned from a project on the learning of morphology. "Learning of morphology" in this context means using an algorithm which takes a large amount of text from a language, and draws conclusions about what are the roots, affixes, and principles of word construction (from roots and affixes) in this particular language. The crucial fact to bear in mind is that the algorithm is to have no prior knowledge of the language that we give to it. Whether the language is English or is Swahili, the learning algorithm starts from the same point; any differences that it draws between the two derive entirely from the data, and not from anything that we have given to the algorithm.

That sounds like a tall order, and in some ways it is. But we can offer the following as motivation for this work. When we teach an introductory course on linguistics, we always reserve the second class on morphology for an experiment. We begin by putting a word on the board: *ninasema*, but we do not tell them this is from Swahili. We ask if anyone knows what it means or what language it

John Goldsmith & Fidèle Mpiranya. 2022. Learning Swahili morphology. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 73–105. Berlin: Language Science Press. DOI: 10.5281/ zenodo.6393738

comes from; if someone does know, we tell them to be quiet for the rest of the class. Then we ask everyone else to divide it into morphemes. There is silence, of course, because the students think they have no idea what the right answer is. Then we ask them to guess how many morphemes there are here: one? two? more? Students guess there are at least two morphemes, and if pressed, typically offer a cut into *nina-sema.*

Then we write *unasema*, and ask them if this allows them to change their minds. Everyone with an opinion opines that the correct cuts are *ni-nasema* and *u-nasema*. When we ask why they do not like *nina-sema* and *una-sema* – which, after all, would allow them to keep the guess that they started out with, when they knew only *ninasema* – they do not know why, but they are pretty sure that *ni/u* + *nasema* is right.

Then we consider a third word, *anasema*, and the students feel confirmed in their judgment after the second word, since they can easily extend their hypothesis to *ni/u/a* + *nasema*. Again, we ask them why they do not want to go for *nina/una/ana* + *sema*, and although they cannot say why exactly, they are pretty confident that this last hypothesis is not right, because it is missing something.

The next word is *ninaona*, and the students easily conclude that there is a break after *nina* (comparing *ninasema* and *ninaona*) and furthermore, the word should be divided up as *ni-na-ona*. The next two words we offer are *ninampiga* and *tunasema*. The first they break up as *ni-na-mpiga*, and the second as *tu-nasema*. How about *ninawapiga*? That must be *ni-na-wa-piga*, and then they realize we must go back and reanalyze *ninampiga* as *ni-na-m-piga*. So far we have what we see in Figure 1.

The point to bear in mind is that the students have done this without being told what the Swahili words mean in English. At some point we explain that in other linguistics courses, the teacher gives their students the same words along with their English translations, but we tell them that we do not think it is necessary to know the meanings of the words to find the morphemes, and that the external form (which is to say, the spelling) is enough to discover the right morphological structure. By the end of the class, we have analyzed about 30 Swahili words, and found the right structure, at which point we tell them what the various morphemes mean in English, and briefly show the template of the Swahili verb, as in Figure 1. We have indicated tense markers with double and verbal roots with single underlining, respectively, for the reader's convenience, here and below.

From this we draw the conclusion that it is possible to learn Swahili morphology even when you do not know another language to compare it to. This is a welcome conclusion, because this is a task that all Swahili-speaking children must undertake when they are two or three years old.

Figure 1: Sketch of Swahili verb structure

⎨

⎪

⎪

⎪

⎪

⎪

⎪

⎪

⎪

Table 1: Blind analysis


But how exactly do the students do this? If we say they just use common sense, that is certainly true, but common sense is notoriously difficult to analyze. Furthermore, there is every reason to believe that this particular task is part of the language-learning capacity, so we have a very real professional interest in puzzling out exactly what this learning process is. It is for this reason that we began to develop an algorithm that learns the morphological structure of words, but with no access to meaning. In fact, we have developed several different approaches (and we are still looking for the best one), all of which we put under the umbrella name *Linguistica* (Goldsmith 2001, 2006, 2010).

How well can an automatic morphology analyzer deal with Swahili today? It has a long way to go before we can see it as comparable to a freshman taking a linguistics course using common sense. Still, there is a lot that it does right – which is to say, there are a lot of linguistic generalizations that it does observe. While there are interesting and complex matters of morphophonology in Swahili (loss of a vowel before another vowel, *ky* becoming *ch*, etc.) we can approach the problem of morphology before solving problems of morphophonology. What Swahili offers is a large range of affixes of similar sizes, and it is quite a challenge for an algorithm to break down a word into morphemes.

The present paper offers an overview of how Linguistica analyzes Swahili words. Its value at this point is not that it does a better job of analysis than a human, but rather that it can do a careful study of the morphology of a text so that we can ourselves more easily discover what it is that we are looking at. Linguistica serves as a tool to let us better understand what the data are that we are looking at when we explore a language on a large scale.

We believe that this work will be of interest to readers in different categories. For the linguist who knows Swahili, the results are interesting because such a linguist sees what can be learned in an explicit procedure of this sort. For such a linguist, Linguistica really serves as a microscope through which the details of the language emerge – almost visually. For the linguist interested in morphology, the work is of interest for what it says about the linguistic analysis of morphemes. The central operation here is – so to speak – *split*; the challenge for language learning is finding the pieces that the grammar of language organizes. One way to say this is that we are figuring out where a language-learner performs a "split" operation, turning an unanalyzed string into its component pieces. *Split*, in this sense, is the flip-side to the process of *Merge*, so much discussed in the current literature. *Merge* makes sense only once we have developed the broad strokes of the rule of *Split*!

### **2 Earlier work in this area**

There was a good deal of work in computational morphology in general, and in automatic learning of morphology in a number of cases, during the last decade of the 20th century and the first decade of the present century, including significant work on Swahili and a number of other Bantu languages. Some of this was motivated by an interest in automatic learning of morphology (see the overviews in Goldsmith 2010 and Goldsmith et al. 2017), and much was motivated by practical goals. These goals included developing morphological analyzers that could be used in speech recognition systems and in machine translation, and also methods that could be used to parse very large Swahili corpora, such as the one developed as the Helsinki Corpus of Swahili, notably the SALAMA project described by Hurskainen (1992, 1999, 2004).

The SALAMA project developed the linguistic resources that made this work possible, notably the corpus of Swahili that contained over 300,000 distinct words. Without the resources that this project created, our work would not have been possible at all.

De Pauw & De Schryver (2008) provide an overview of much of that work, and they discussed work on a number of languages, including Northern Sotho, Zulu, Xhosa, and Tswana (see also De Pauw et al. 2009). There was a special issue on African Language Technology in *Language Resources and Evaluation* in 2011 which provided coverage of work that was being done at that time. As just noted, much of the work had practical goals in mind, such as improving speech recognition (Gelas et al. 2012) and machine translation systems, and in such a context, focusing on the development of a system that operates with no language-particular knowledge is a luxury item that has few rewards. Several researchers applied one of the Morfessor systems, such as Gelas et al. (2012) and some applied one of the Linguistica systems. Lindén (2008) explores predicting unseen Swahili words; see also Muhirwe (2007).

Several efforts have included "data-driven" learning, including De Pauw et al. (2006) on Swahili, De Schryver & De Pauw (2007) on Northern Sotho. Unsupervised learning is discussed for Luo (Nilotic) in De Pauw et al. (2010), and for Gikuyu (De Pauw & Wagacha 2007). Lindén (2008) discusses semi-supervised lemmatization of Swahili.

### **3 What is a morphological analysis?**

What is a morphological analysis? This question is not anodyne; our answer to it determines what we expect of our learning algorithm. In much of the work in this area over the last 20 years, the answer has been that a morphological analysis is a division of each word into the pieces called *morphs*. We would like to accomplish more than that; we would like to discover more of the principles that determine the order and the distributional possibilities of roots and affixes.<sup>1</sup>

We can learn an extremely important lesson by looking at what the students did as they examined the Swahili words and proposed an analysis, based purely on form. They became convinced that they had found the right pattern, one that really gets at something *true* about the data, when they discovered sets of morphemes that take the form in Figure 2 for English or Swahili. Of course, the patterns there hold not just for these two verb stems, but for a very large set of stems, and this is even more true in the case of Swahili.

$$\left\{ \begin{array}{c} \text{jump} \\ \text{laugh} \end{array} \right\} \left\{ \begin{array}{c} \text{ed} \\ \text{ed} \\ \text{s} \end{array} \right\}$$

$$\left\{ \begin{array}{c} \text{ni} \\ \text{u} \\ \text{a} \end{array} \right\} \left\{ \begin{array}{c} \text{na} \\ \text{m} \\ \text{m} \end{array} \right\} \left\{ \begin{array}{c} \text{ou} \\ \text{wa} \\ \text{pig} \\ \text{pig} \end{array} \right\} \left\{ \begin{array}{c} \text{sem} \\ \text{j} \\ \text{pig} \end{array} \right\} \left\{ \begin{array}{c} \text{a} \\ \text{a} \end{array} \right\}.$$

Figure 2: Two pieces of morphology

These representations show clearly how a good morphological analysis captures excess information that would be present if we were to simply list all of the relevant words. Morphological analysis starts with words, identifies redundancies, and uses those redundancies to create a representation in which what is stated is the essence of the grammatical description, that is, what makes the language what it is. In the present case, the word *redundancy* means needless repetition of a string of phonemes (or letters).

<sup>1</sup>The view that the study of words was the study of how the words are composed of morphs and morphemes was viewed by most American linguists of the first half of the 20th century as the greatest American contribution to linguistics, second only to the setting of the phoneme on a firm methodological basis.


Table 2: Class-based prefixes

So how can we devise an algorithm to accomplish this? As our linguistics students showed us, there is a great deal to be learned from comparing pairs of words, which is what they did as we gave them words, one at a time. Zellig Harris, in a famous paper (Harris 1955), suggested that a good estimate of the likelihood of a morpheme break could be devised if we take an alphabetized list of words, and with each word, we trace through it one letter at a time, asking after letters, how many different letters those first letters were followed by in our particular corpus from the language. For example, after *jum*, two letters (*p* and *b*) were found in a corpus we were looking at recently (from *jump* and *jumble*), while after *jump*, four letters followed (*space*, *e*, *i*, and *s*), and after *jumpi* only one letter follows (*n*).

Harris believed that by measuring this *successor frequency* we could find good candidates for morpheme breaks, and he was right. But the strings that we discover in this way are only candidates; many of them are not at all morphemes, and many morphemes are not discovered by Harris's method (or rather, by his methods). We hesitate to show the reader what can go wrong; it may cause them to wonder why we are using these methods. Here is a summary of the first stage of the algorithm that Linguistica employs. If we are seeking suffixes:


If we are seeking prefixes, we do much the same, except that we do it in the reverse direction. We scan each word from right to left, looking to see how many different letters *precede* each string reaching to the end:


This account assumes that we already know where the points are where the successor frequency (or predecessor frequency) is greater than 1, and it turns out that there is a simple way to find all those points, in all the words, and it requires much less work than one might imagine. First, alphabetize the list of words, and then go through that list looking only at pairs of words that are adjacent on the list (such as *walked, walking*, for example). Scan the two words from left to right (if we are looking for suffixes) or right to left (if we are looking for prefixes), and stop at the first point where the two words differ by a letter. The algorithm takes what is to the left of that point as a potential stem, and it then moves on to the next pair of words. This process is both simple and fast, from a computational point of view.

It is not right to say that the algorithm is finding stems at this point. We will let it analyze Swahili to find prefixes, and in so doing, we are finding the leftmost set of morphemes, and treating everything that follows as an unanalyzed whole, which we call for now simply a "potential-stem." In fact, that potentialstem contains many morphemes within it; we are now engaged in simply slicing off the leftmost prefixes of the words in Swahili, and we have just called what follows a "potential stem." We will continue to cut the potential-stem down to smaller morphs as that becomes possible. Thus in most of the cases we look at below, the "potential" stems that are computed are themselves analyzable into morphs (at a later stage in the computation, as well as in our heads). In order to avoid the ambiguity of the phrase "potential stem," we will create a new term, *parastem*, to refer to this. A signature is composed of a set of affixes and a set of parastems, and the parastems may themselves be analyzed further in additional signatures. A parastem that can be broken down no further is a stem.

As we observed at the beginning of this section, within the community of computational linguists working on the problem of automatic learning of morphology, different researchers have begun with different assumptions about what the task is. Some linguists have focused on the problem of segmentation, which means dividing a word up into successive morphs, while others (perhaps skeptical about the notion of morph or morpheme) seek to tag any given word with the morphosyntactic features that it bears. Our work falls in the former group – that is, we are very concerned with finding the proper analysis of a word into consecutive morphs. In addition, we would like to provide an analysis of how morphemes relate to one another in a word. Traditionally, linguists have spoken about relationships in praesentia, relations between morphemes that appear in a given word, and relationships in absentia, which is to say, the way in which multiple morphemes are alternatives to one another in a particular position in a

word's morphology. We are interested in learning as much as possible about this aspect of a language's morphology as well.

Our interest here is exploratory. We are not in a rush to develop a practical tool; we have the opportunity to take some time and look at what kind of evidence regarding linguistic structure can be found by looking carefully at language data.<sup>2</sup>

### **4 Morphology of the left edge of the Swahili verb**

We used the Helsinki corpus of Swahili, which has about 300,000 distinct words. When we applied the current Linguistica algorithm above to 300,000 words to find prefixal signatures, we found 3,434 signatures; when we added an entropybased filter,<sup>3</sup> 1,235 signatures remained, and it is this set of signatures that we will describe.

In some ways, using Linguistica is a bit like using a microscope, and just like when we use a microscope for the first time, it takes a bit of experimenting before the picture comes sharply into focus. Let us begin our tour, then, with a rough statement of the position of morphemes in finite Swahili verbs, and a summary of what Linguistica gleans from a large corpus.

<sup>2</sup>The algorithm that we employ here has the following stages. We will describe the process of prefix discovery, and the mirror image of it is used for suffix discovery. First, we alphabetize the list of words from the right-end of each word, then we look at each pair of adjacent words on this list, and determine find the rightmost letter whereby the two words disagree. We take the material to the right of that point as a *protostem*. For each word that ends with protostem , we take to be 's extension if = + (i.e., if is what precedes in word ). We call each set of extensions to a protostem a *protosignature*. We collect all protosignatures that are associated with at least two protostems. We create a set of signatures which consist of a collection of extensions and all of the stems which occur with exactly those extensions in the corpus. If all of the stems in a signature end with the same letter or string of letters, that letter or string of letters is moved from the stems to the extensions. Two further functions are used to identify licit morphemes in the extensions in the system used here.

<sup>3</sup>That filter is roughly this: when the algorithm makes a prefix cut, in light of what we have said so far, it is because as we scan from right to left, a spot is found where there are two alternative options: e.g., as we scan *kitabu* and *vitabu* from right to left, a break will be created before the common stem *-itabu*, though this is in fact wrong. Indeed, throughout the corpus, when a signature *k=v* would be uncovered, it will always be followed by exactly one phoneme option: the vowel *i*. The location of true morpheme breaks always involves options both to the left and to the right (which is to say, both prior in time and forward in time). We measure this notion of *options* by the entropy of the final letter right before and right after a proposed morpheme break, and if (as with *k-itabu, v-itabu*) zero entropy is discovered (which is just a way of saying that only one letter is present), the algorithm proceeds to create other splits until a non-zero entropy is found. In the case of *kitabu*, this means adding the break *ki-tabu, vi-tabu*, which has a non-zero entropy after the break, since many different letters follow *ki*and *vi-*, such as we see in *kilima, vilima* 'hill, hills'.

The textbook description of Swahili is much as given in Figure 3, while Linguistica's conclusions for the left side and the right side of the Swahili verb are given in Table 4. The first figure concerns the initial subject marker position, the following tense marker position, and the position after the tense marker. It does not properly distinguish object markers, such as the *-ki-* in *ni-li-ki-som-a* 'I read it' from the relative clause marker *cho* in *ki-tabu ni-li-cho-ki-soma* 'the book that I read'.

[ subj marker][ tense marker][ rel cl marker][ object marker][verb root][extensions ][ final vowel]


Figure 3: Sketch of Swahili verb morphology

Figure 4: Summary of Linguistica's analysis of the left half of the Swahili verb

Indepently of prefix discovery, Linguistica analyzes the right-end of the word, and the major part of its conclusions are summarized in Figure 5.

However, the template given in these two tables gives only a superficial summary of Linguistica's analysis. Let us consider what happens with each slice of the analysis.

There are 1,235 signatures that emerge from the first iteration of prefixal analysis. What do these signatures say? What can we learn from them? It is natural


Figure 5: Three signatures from Linguistica's analysis of the right half of the Swahili verb

to sort them in some way so that the most interesting signatures will appear at the top of the list, and there are several ways of sorting that come to mind. We might, for example, sort the signatures by the number of stems they contain, or we might sort them by the number of affixes. Both present us with interesting material. From a linguist's point of view, sorting them by the number of affixes is by far the more interesting. Figure 6, which is presented to the user by Linguistica, shows how we can arrange the signatures in a lattice, where signatures with the same number of affixes appear on the same row, and in which the signatures in each row are sorted by decreasing numbers of stems (though we have departed from that latter point a bit to make the figures easier to read here).

We need to explain carefully what the relation is between the several figures and tables given here. Each box in Figure 6 is a signature, and each corresponds to an individual *row* in Table 4. Each of these signatures corresponds to an individual *number* that appears in the rightmost field of Table 3, and the reader can see the correspondence by comparing the number which indicates the number of stems in each signature.

In Table 3, each line corresponds to a row in Figure 6 – top row to top row, and then down from there. Each of the lines in the next table, Table 4, corresponds to *one* of the individual signatures tallied in Figure 6 or Table 3. The first row in Table 4 corresponds to the one signature with 14 affixes, and the five signatures in row 2 in Table 5 are the next five signatures in Table 3 (or Figure 6).

Figure 6: Top of the lattice of word-initial signatures


Table 3: Word-initial signature counts


Table4:Examplesofword-initialsignatures

#### **4.1 The top signature and its parastems**

The very top signature in Figure 6 is the largest signature, in that it has a set of 14 prefixes: ∅*-a-i-ki-ku-li-m-ni-tu-u-vi-wa-ya-zi* ; it does not contain the locative marker *pa-*. <sup>4</sup> The vast majority of signatures beneath it are composed of subsets of those 14 prefixes. Each of the five signatures on the row for 13 affixes is missing one prefix from the one in row 14 (in both Table 3 and Figure 6), just as each of the signatures in row 12 is missing one from those above it, and these differences between the affixes comprising the signatures are marked on the lines in the figure. We have given 18 signatures in Figure 6 out of the total set of 1,235.

Let us dig a little more deeply, and look at the parastems that are found in the top signatures (recall that a parastem is an element in a signature which may be analyzed in a later signature). Each parastem is sorted by, and listed with, its total occurrence count in the corpus in Table 5. One does not see much that stands out with this frequency sorting, but we have sorted the stems alphabetically in Table 6. Here we have *manually* underlined the tense markers and double-underlined the verb roots for the reader's benefit, and manually indicated morpheme breaks (these breaks have not been discovered by Linguistica yet).


Table 5: Parastems of the longest signature, sorted by frequency

<sup>4</sup>We consider the absence of *pa* to be an error committed by Linguistica.


Table 6: Parastems of the longest signature alphabetized from left to right

There are some interesting points here (though once we see them, we are inclined to say to ourselves, Oh yes, I should have thought of that). First of all, there are several monosyllabic elements, including three that are monomorphemic *na, le, we*, often referred to as *pronominal stems* (which can be monomorphemic or bimorphemic) in the Swahili literature. *na* can be translated as 'with,' and it is used to indicate possession (*a-na kitabu* 'he has a book', where *a-* is the Class 1 verbal prefix). *-le* is a demonstrative stem 'yonder' which is preceded by a pronominal prefix, and *-o* is a demonstrative stem 'near you.' *We* is a stem that marks 2nd person singular; we return to Linguistica's treatment of pronominal stems below.

From a quantitative point of view, there are two points of interest. The most frequent item in the list of parastems, *na*, has a ridiculously high count at 554,554; as we just noted, SM + *na* expresses possession (*na* could be translated as *with*). Other than this one item, the rest of the parastems reflect a frequency distribution in keeping with a Zipfian distribution, as we find in most of the other signatures as well (we return to this immediately). It is striking, as well, that there are no parastems with frequency below 91.

Let us digress for a moment on an interesting point. Word distribution in Swahili is Zipfian as it is other languages, which means that there are a very large number of words that occur very rarely: just once. That proportion is around half: about half of the words in a wordlist drawn from a corpus occur only once. Words that occur only twice in the entire corpus is about two-thirds of that, and over a relatively large range of words of high frequency, the observed frequency is inversely proportional to the rank of the work on the word-frequency list: the

 th word on the frequency ranked list has a frequency around 0.06/. This breaks down for words of low frequency, but it summarizes well what frequencies we can expect among the most frequent words of a language. This distribution holds for stems as well.

There is a sense in which we can speak of the placement of potential words in this lattice in a dynamic fashion. Suppose we complete our morphological analysis of Swahili on our corpus, and then we go back to the beginning of the corpus and consider each word, knowing now its internal morphological structure but not knowing at any given moment anything about its future appearances in the corpus. In this mental experiment, we can watch any individual parastem as it climbs up through the lattice as we proceed further and further down the corpus. Let us say that we are observing the stem ; we watch it first appear with a given prefix <sup>1</sup> , and then later with prefix <sup>2</sup> ; this puts the stem on the second row of the lattice, inside the signature with those two prefixes <sup>1</sup> and <sup>2</sup> . When it appears later with a third prefix <sup>3</sup> , it moves up again, and it might eventually get to the top, once the parastem has appeared with all of the prefixes that it possibly can.

Why don't all parastems appear at the very top of the lattice, then? The first answer is because language is Zipfian, and most parastems do not occur very often – certainly not 14 times, the minimal number of occurrences needed for a parastem to get to the top of the lattice. A second reason is that not all parastems want to occur with all the different noun classes (so to speak); a stem built from a verb which requires a human subject is likely to occur primarily or only with class 1 and 2 markers,<sup>5</sup> and a lot of verb roots have this property, and similar remarks hold for other subgroups of verb and adjectival roots.

The result of this is that it is of linguistic interest to see how quickly a given parastem moves up this lattice, and which ones get stuck in the lattice somewhere below the top row. They get stuck if they are infrequent, or they get stuck if there is a reason why they should not appear with all of the noun classes. End of digression!

#### **4.2 The 2nd signature:** ∅*-a-i-ki-ku-li-m-ni-u-vi-wa-ya-zi*

After the first signature, with 14 affixes, we find that the next 5 longest signatures all contain 13 prefixes. A moment's thought will tell us that we should have expected that there would be 14 different signatures here, rather than only 5; after

<sup>5</sup>Reality is more complex than this comment suggests; for example, there are a number of stems, such as *rafiki* 'friend' that has two plural forms, *rafiki* and *ma-rafiki*, where *ma-* is the class 6 prefix.

all, there are 14 different ways to contain one fewer affix than the total of 14; put another way, there are 14 different ways to select 13 affixes from a set of 14. But in fact there are only 5 such signatures, rather than 14. If we retain the dynamic image described just above, we can imagine that the stems that are in these 5 signatures are those which failed to reach the top because they each failed to get one of the 14 affixes, and it would be reasonable to expect that these would be the 5 with the lowest frequency. Here is what we find; the missing affixes are *tu*, ∅, *ku*, *ya*, and *vi*. 6

The second signature, which has 13 affixes and 56 stems, is missing the subject marker *tu-* (1st person plural). Its parastems are given in Table 7. As with the first signature, it is hard to see much in this table, but if we sort the parastems alphabetically (left to right), we find a more interesting pattern in Table 8.


Table 7: Parastems of second signature, sorted by decreasing frequency

It is striking that the parastems in Table 8 are composed of a small number of morphemes reused in different combinations, a good deal more so than was seen in Table 6. In Table 8, there are 56 parastems, and all but 4 begin with one of the

<sup>6</sup>The reader might think that this means that for each of the 30 parastems in the top signature, the final prefix they encountered in the corpus was one of ∅, *ku* and *pa*. That is not quite right, because it is possible that they went through a state in which they had 14 different affixes, but not one of the three signatures ranked 2, 3, or 4; this is possible because if all of that signature's stems had moved up to the top signature, we would not see that phantom signature in the program's output.


Table 8: Parastems of second signature, sorted left to right

5 tense markers *ka, ki, li, me, na* (see Table 12). Linguistica has not yet identified those as a class of morphemes – that will have to await the second iteration – but the natural goal for the learner is to find a way to identify subclasses of data that are going to be easier to analyze than the entire vocabulary taken as a whole. A number of roots are reused a good deal; these are the high frequency roots of the language, often used as auxiliary verbs in certain respects: *-baki-* 'stay': 4, *-anz-* 'start': 4, *-ingi-* 'enter': 4, *-fany-* 'do, make': 5. (We emphasize here that the apparent identification of the prefixes in this table was done by us, not by Linguistica.)

Let us take a step back. The top signature, as we sort by number of affixes, is the signature with 14 noun class prefixes plus the null prefix, and it does not include the locative prefix *pa-*, which does not appear until the 23rd signature, which is *a-i-ki-ku-li-pa-u-vi-wa-ya-zi*, with 30 stems. As we go down the list of signatures, as they get shorter (i.e., we go down a list of signatures which is sorted by the number of class prefixes contained), we have to wait until signatures 355 and 356 (∅*-al-k-l-z* and ∅*-ha-k-n-z*) till we find anything else. ∅*-al-k-l-z* is, to be sure, an error,<sup>7</sup> and ∅*-ha-k-n-z* is an error as well,<sup>8</sup> but it has the first occurrence

<sup>7</sup> It wrongly places an in the stem which should be in the prefixes.

<sup>8</sup>Again, it wrongly places an in the stem.

of the principal negative prefix *ha*. The next 138 signatures are various subsets of the class prefixes, and then the next three signatures consist of two errors and the first significant appearance of the negative form of the verb. These three signatures are ∅*-al-l-v*; ∅*-h-k-t* (both errors) – and ∅*-ha-hawa-si*, which has 172 parastems associated with it. Let us look at this negative signature.

#### **4.3 More on parastems**

We have emphasized that the parastem that is revealed by Linguistica's algorithm is often analyzable, and that it frequently consists of several morphemes. But the parastems discovered need not be complex; if we look at very high frequency parastems to a signature in the first (left-most) layer, one of the highest is *-fanya* 'do, make', with 14,293 occurrence in the signature ∅*-a-i-ki-ku-li-m-tu-u-wa-yazi*. Another is *-taka* 'want', with 5,434 occurrences in the signature ∅*-a-i-ki-kuli-m-u-wa-ya-zi*. Still, this is the exception rather than the rule.

#### **4.4 Verbal negation: the prefix** *ha-*

Verbal negation in Swahili is expressed in ways that are governed by the tense. The simplest pattern for Linguistica to find is the pattern in the simple past tense, as briefly illustrated in Table 9.


Table 9: Verbal negation

The ∅*-ha-hawa-si* signature brings together, for example, the forms *kusoma: hakusoma: hawakusoma: sikusoma*. These four forms are the infinitive, followed by three negative past tense forms, where *-ku-* plays the role of a tense marker (marking past tense, the negative TM corresponding to the affirmative *-li-*.)

The 1st person singular prefix *ni* is replaced by *si* in negative verbs, and while the present tense negative will in native vocabulary bring with it (so to speak)

a change of the final vowel to *-i*, that change is not observed in the past tense. Thus the three verbal forms in this signature *hasoma-hawasoma-si* are given in Table 9.

The examples in Table 10 illustrate the overwhelming dominance of the past tense negation occurring in this signature. Why do we not find something similar for the present tense? The principal reason is the one already mentioned: in the present tense, the final vowel is most often different than the final vowel in the corresponding affirmative present tense, and thus a method that looks for patterns based on a right-to-left scan is bound to fail, at least at this point in the analysis.

However, there are other ways for the correct analysis to emerge from the data. For example, there are four signatures with three items selected from the set *ha, hai, hatu, hawa, hazi, hu* and *si*. The signature *ha-hai-hawa*, for example, is associated with 181 stems. Of these 181, 75 begin with the tense marker*-ja-*, which is the tense marker for the negative perfect. We have listed the 32 parastems in this signature with the highest frequency, but the following generalization holds throughout: either the parastem begins with *-ja-*, or it is a (borrowed) verb root ending with its own final vowel (and hence has the same final vowel in the present tense negative as in the affirmative).

## **5 Second iteration**

Let us turn now to the next set of prefixes that we are looking for on the left edge of the Swahili word. We will try a simple procedure: we will consider all of the parastems uncovered during the previous iteration, and apply the same algorithm, treating the parastems as if they were the set of words. In the event, with some 301,000 words in the first iteration from the corpus, we now have 56,363 parastems to consider. From these parastems, 212 signatures arise, and some of the global information is presented in Table 13. The top signatures themselves are given in Table 14.

The signatures in Table 14 support an analysis in which this morphological position includes the morphemes in Table 6, where we have put the traditional designations on these tense markers. The morphs in Table 14 which are not tense markers (and which are errors) are: *lio, o, nayo, i* and *si*.


Table 10: Parastems of the signature ∅*-ha-hawa-si*


Table 11: 32 of the 181 parastems of the signature ∅*-hai-hawa*



Table 13: Signatures of the second position (tense marker) in the word



Table 14: Selected tense marker signatures

### **6 Suffixal system**

When we run our algorithm to find the suffixal system, we find 1,263 signatures, distributed in length in Table 15, and illustrated in Table 16 for the longest signatures.

#### **6.1 The verbal system**

We will focus first on the longest signatures, those with the largest number of affixes. This keeps us in the domain of verbal morphology.

On the whole, the analysis is remarkably good – or linguist-like, in any event. The forms in Table 16 are too long for a linguist's tastes, but the additional parsings given in Table 17 are almost entirely correct. We would like, first of all, for

the final vowel to be separated as a distinct morpheme, and there is a bit more to be said about the -VC- morphemes on the left side of the arrows in this table.

These -VC- morphemes are called *extensions* in Bantu languages, and the most common ones are *-an-* (reciprocal), *-esh-/-ish-/-ez-/-iz-* (causative), *-ik-* (stative), *-iw-* (passive), *-uk-* (reversive).<sup>9</sup> The remaining cases are errors: *bish ki li mi ng sh ti uli ush uz*. 10


Table 15: Final signatures

At the same time, Linguistica proposes the additional analyses, given in Table 17. Table 7 summarizes some of Linguistica's analysis, which really *should* be what is shown in Table 8.

$$\begin{Bmatrix} \text{bish} & \text{esh} \\ \text{ez} & \text{ili} \\ \text{ish} & \text{iz} \\ \text{ki} & \text{li} \\ \text{mi} & \text{ng} \\ \text{sh} & \text{ti} \\ \text{uk} & \text{uli} \\ \text{uz} \end{Bmatrix} \quad \left\{ \begin{array}{c} \text{an} \\ \text{i} \\ \text{wa} \end{array} \right\} \quad \left\{ \begin{array}{c} \text{an} \\ \text{ish} \\ \text{i} \\ \text{uk} \\ \text{uk} \end{array} \right\} \quad \left\{ \begin{array}{c} \text{a} \\ \text{i} \\ \text{i} \\ \text{sh} \end{array} \right\} \quad \left\{ \begin{array}{c} \text{an} \\ \text{i} \\ \text{ish} \\ \text{v} \end{array} \right\} \quad \left\{ \begin{array}{c} \text{an} \\ \text{a} \\ \text{v} \end{array} \right\} \quad \left\{ \begin{array}{c} \text{a} \\ \text{i} \\ \text{v} \text{a} \end{array} \right\} $$

Figure 7: Almost final results

<sup>9</sup> In addition, there is the vocalic extension *-i-* (applicative), which surfaces as *-li-/-le-* with verb stems ending in two vowels (e.g., *ia, ea, aa, oa, ua*); this is discussed in Mpiranya 2014: 112, 146.

<sup>10</sup>The morph *-ele-/-ili-* in pairs like *-enda/-endelea* 'go/progress', *-penda/-pendelea* 'like, prefer' appears as a lexicalized intensive suffix.


Table 16: Selected final signatures

$$\begin{Bmatrix} \text{fish} & \text{sch} \\ \text{ez} & \text{ili} \\ \text{ish} & \text{iz} \\ \text{ki} & \text{ng} \\ \text{mi} & \text{ng} \\ \text{uk} & \text{ti} \\ \text{uk} & \text{ul} \\ \text{uz} \end{Bmatrix} \quad \begin{Bmatrix} \text{o} \\ \text{j} \\ \text{w} \end{Bmatrix} \quad \{a\}; \quad \begin{Bmatrix} \text{an} \\ \text{ik} \\ \text{ish} \\ \text{iz} \\ \text{uk} \\ \text{ush} \end{Bmatrix} \quad \{a\}; \quad \begin{Bmatrix} \text{on} \\ \text{j} \\ \text{j} \\ \text{v} \end{Bmatrix} \quad \{a\}; \quad \begin{Bmatrix} \text{an} \\ \text{ish} \\ \text{v} \end{Bmatrix} \quad \{a\}$$

Figure 8: Correct but not discovered


Table 17: Identification of extensions in final suffix sequences

## **7 Three other, simpler cases**

Linguistica's performance with grammatical stems is mixed: some good, some bad. We will briefly look at three.

#### **7.1** *-ote* **'all'**

The stem *-ote* 'all, entire, whole' is one that takes the pronominal prefixes of the sort found before a vowel. We do not find all its forms in the Helsinki corpus, and Linguistica places it in a signature with 11 other stems, all of which appear with the prefixes ∅*-a-i-ki-li-m-ni-tu-u-wa-ya-zi*, where we indicate the stem counts in the corpus (Figure 9).


Figure 9: Analysis of stem *-ote*

#### **7.2** *-angu* **'my'**

Linguistica's analysis here is not very good at all. Linguistica is permitted to assign multiple analyses to words, and it does so quite a bit with these words, as we see in Table 18. The stem *-angu* is identified in only two of the 15 forms present, and five different roots enter into the proposed analyses of the various forms. Even after studying the results, we are not certain why the algorithm wanders so far from the right answer. It does much better with a consonant-initial form such as *-ko*.


Table 18: -angu 'my'

#### **7.3** *-ko* **of location (from** *ku-o***)**

All but one of the forms in Table 19 is correctly analyzed, but because a word can be multiply analyzed, quite a few have more than one analysis, which is not what we want to see here. In addition to the root *-ko*, other stems are incorrectly seen in one form or another: *-iko, -ako, -uko, -mko, -yako, -tuko, -niko* and even *-kiko*. The wrong analyses here are clearly motivated by the presence of words that are grammatically irrelevant but which Linguistica's lack of understanding of real grammar makes it incapable of ignoring. This is an area that we hope to explore more in the near future.

### **8 Conclusions**

What do we think that the reader should make of the work presented here? It is, after all, a computational model trying to perform as well as a trained human linguist, and in many respects does not come up to the standards of the linguist. Some things it can do better than a human, such as paying careful attention to the fact that many different combinations of the class prefixes appear and yet not all with the same frequencies. And it can do relatively poorly on analyzing the possessive form *-angu*. Still, it is not unreasonable to observe that *if* we were to be handed a wordlist of 300,000 words in an unknown language, having Linguistica as a tool would be a fabulous resource.<sup>11</sup>

One otherwise sympathetic reader of an earlier version of this paper expressed the view that we have not given the reader a good enough sense of where Linguistica fails. We have pointed to a few cases of errors, but not given a global sense of the balance of success and failure. That criticism is entirely correct. There are two challenges that Linguistica does not handle well which are important for dealing correctly with a Bantu language, and we have seen them here. The first is deciding which signatures should be "collapsed," i.e., seen as the same signature. For example, in Figure 7 we see 18 signatures with small differences of prefixes. Linguistica should be able to take the next leap, which is to say that it should treat all of these as belonging to the largest signature, thus making predictions about possible but unseen words. It should, that is, realize that the affixes that are "missing" in various cases are only accidentally missing. This problem arises in every language that we have worked on (or know of!), and it is a challenge

<sup>11</sup>There are similar projects undertaken as we write this, including work on the Voynich manuscript (see https://en.wikipedia.org/wiki/Voynich\_manuscript) and on Iberian (see http: //ibers.cat/corpuseng.html).



Table

102

that we are working on. Note that solving this problem includes *not* placing a signature such as *ki-vi* in with the verbal prefix signatures (and many other cases of the same sort: we need to keep separate verbal, adjectival, and nominal prefix sets, which is not at all a trivial thing to do). A second problem (which is not unrelated to the first, but all of these problems are in one fashion or another related) involves correctly identifying the "left to right slots," so to speak: to correctly identify relative clause markers as something different from object markers, for example. Perhaps the reader should take away the message that very interesting things are done right by Linguistica's analysis, but we are not in a position to say that Linguistica really has the big picture of the morphology correctly sketched.

One of the hotly debated topics in linguistic theory over the last forty years has been the question as to whether the human ability to learn language is something similar in character to other human abilities. Yet while the project that we have described here is one of the relatively few language-learning projects that works on large collections of raw text, it is not at all clear which group of linguists should be happy with the successes that we have documented here. Do the steps in the algorithm that we have used seem like the kinds of things that a Chomskian rationalist would expect to find in a Univeral Grammar? The honest answer is simple: who knows? Suppose (as we believe, on better days) that the present project is one of the best examples of modeling grammar acquisition. If that is so, we have no reason to say that this is not what a rationalist learning algorithm looks like. On the other hand, the person who is uncomfortable with the *deus ex machina* character of Chomskian Universal Grammar could perfectly reasonably say that Linguistica's careful analysis of large amounts of data is exactly what puts this project in the empiricist, and not the rationalist, camp. But that too would be the voice of prejudice. The rationalist has no solid grounds for insisting that the language learner is incapable of handling large amounts of data and learning from it.

In the end, a principal interest of this project is that it allows us to build a truly explicit learning algorithm, working not on toy data (small amounts of selected data) but on very large and real data sets. That, in turn, provides us with a useful tool for better studying real and large corpora in linguistically sound ways.

### **References**

De Pauw, Guy & Gilles-Maurice De Schryver. 2008. Improving the computational morphological analysis of a Swahili corpus for lexicographic purposes. *Lexikos* 18.


Lindén, Krister. 2008. A probabilistic model for guessing base forms of new words by analogy. In *International Conference on Intelligent Text Processing and Computational Linguistics*, 106–116.

Mpiranya, Fidèle. 2014. *Swahili grammar and workbook*. Abingdon: Routledge.

Muhirwe, Jackson. 2007. Computational analysis of Kinyarwanda morphology: The morphological alternations. *International Journal of Computing and ICT Research* 1(1). 85–92.

# **Chapter 6**

# **Focus marking strategies in Igbo**

## Mary Amaechi<sup>a</sup> & Doreen Georgi<sup>b</sup>

<sup>a</sup>University of Ilorin <sup>b</sup>University of Potsdam

In this paper we describe the encoding of term focus in the Benue-Kwa language Igbo. Next to a discussion of focus marking devices that are available in the language and their different pragmatic usage conditions, we highlight the fact that the observed subject/non-subject split in focus encoding provides novel insights into the scope and generality of the focus marking generalization put forward in Fiedler et al. (2010). We argue that the distribution of focus markers is not solely regulated by pragmatic principles (viz. to avoid a default topic interpretation especially for subjects), but also by the syntactic position of the focus marker, its morphological realization conditions as well as a ban on too local subject movement.

## **1 Introduction**

This paper investigates focus marking in the Benue-Kwa language Igbo spoken in Southern Nigeria. Focus is an information-structural category; the constituent in focus is the most salient part of an utterance in a given discourse and signals the presence of alternatives that are relevant in the discourse for the interpretation of an utterance (see among others Jackendoff 1972, Dik 1997, Rooth 1985, Krifka 2008, Zimmermann & Onea 2011). We will be concerned with focus marking in Igbo, i.e. the linguistic encoding of focus by grammatical devices (Fiedler et al. 2010). Furthermore, we will concentrate on the term focus (viz. the encoding of focus on arguments and adjuncts) and leave verb and VP-focus for future research. Igbo is of interest for the study of focus marking because it is relatively rich in morphosyntactic devices that are available to mark focus, and to a certain extent the different strategies encode different pragmatic types of focus. But apart from describing the focus marking system of Igbo, the main aim

Mary Amaechi & Doreen Georgi. 2022. Focus marking strategies in Igbo. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 107–122. Berlin: Language Science Press. DOI: 10.5281 / zenodo.6393742

of this paper is to highlight a subject/non-subject asymmetry in focus marking. While such a split is cross-linguistically common, especially in (West) African languages, as documented in Fiedler et al. (2010) and Kalinowski (2015), the Igbo split provides (partial) counter-evidence for the generalization on focus marking splits put forward in Fiedler et al. (2010). They observe that in contrast to nonsubject focus, subject focus must always be marked in some way in West African languages. In Igbo, however, subjects cannot and sometimes must not be marked for focus by the usual devices applied in the language. This shows that the focus marking asymmetry between subjects and non-subjects can also have other sources than the pragmatic one identified in Fiedler et al. (2010) (viz. the avoidance of a topic interpretation for subjects). We argue that the asymmetry in Igbo results from an interaction of the syntactic position of the focus marker in the left periphery, a morphological realization condition on the focus head, and a general constraint on too local movement (anti-locality).

Before we can investigate focus marking strategies in Igbo, we must first introduce the basics of its morphosyntax (see e.g. Green & Igwe 1963, Carrel 1970, Manfredi 1991, Mbah 2006, Emenanjo 2015). The basic word order in a thetic sentence in Igbo is SBJ-V-DO-ADJ(uncts), see (1):

(1) Òbí Obi hụ̀rụ̀ saw Àdá Ada n'-áhíá. P-market 'Òbí saw Àdá at the market.'

The language does not have verb-argument-agreement but rich derivational morphology (Uwalaka 1988). The case system is highly reduced with nom-acc distinctions in some personal pronouns. Igbo is a tone language that distinguishes high (á), low (à) and downstep (ā) tone; these encode both grammatical and lexical distinctions. We assume the clause-structure in (2) for an informationstructurally neutral sentence with a transitive verb (Amaechi & Georgi 2019):

(2) [CP C [TP DPext [T' V+v+T [vP <DPext> [v' <v> [VP <V> DPint ]]]]]]

The verb moves successive-cyclically through v to T (lower copies indicated in angled brackets); the structurally highest argument (the external argument) undergoes obligatory EPP-movement to SpecT (see Amaechi & Georgi 2019 for empirical arguments for these assumptions).

The paper is structured as follows: In §2 we introduce the various focus marking strategies of Igbo and document which pragmatic types of focus they can express. §3 shows that the term questions follow the same encoding strategies as focus, with an interesting difference with respect to (local) subjects. §4 discusses the subject/non-subject marking asymmetry and its relevance for cross-linguistic

generalizations on focus marking splits. In §5 we briefly summarize our analysis of a (subset of the) focus marking strategies that derives the observed asymmetry.

### **2 The expression of focus in Igbo**

In this section we will describe how arguments and adjuncts can be focused in Igbo. We summarize both the morphosyntactic means as well as the rough discourse-pragmatic use of the various strategies.<sup>1</sup>

#### **2.1 Morphosyntactic properties**

Focus on arguments and adjuncts in Igbo can be expressed by three different constructions that we will refer to as the in-situ strategy, the ex-situ strategy, and the cleft strategy, respectively. We address each of them in turn. In the insitu strategy, the element that is focused occurs in its canonical position, i.e. in the position it also occupies in an all new/out-of-the-blue-sentence (where the corresponding constituent alone is not in focus), see (3); the focused elements are represented in small caps in the English translations. Hence, there is no syntactic marking of focus in the sentence; there is also neither a morphological indication of focus (e.g. by a focus marker) nor phonological highlighting (e.g. by stress) of the focused constituent. This strategy is used frequently in answers to questions and it is only available for non-subjects, i.e. direct objects, adjuncts (see 3) as well as indirect objects, but not for (local) subjects without further changes (see below):

	- a. Context: Òbí hụ̀rụ̀ ònyé n'-áhíá? 'Who did Obi see at the market?' Òbí Obi hụ̀rụ̀ saw Àdá Ada n'-áhíá. P-market 'Òbí saw Àdá at the market.' *do focus*

<sup>1</sup>A subset of the basic morphosyntactic facts has already been described in the mostly descriptive literature on Igbo, although with a focus on question formation, see e.g. Goldsmith (1981), Ikekeonwu (1987), Uwalaka (1991), Mmaduagwu (2012), Nwankwegu (2015). However, these sources do not provide a systematic overview, do not take into account all focus marking devices or pragmatic usage factors; and most importantly, they do not offer a detailed study and analysis for the observed subject/non-subject split in focus marking. The data in this paper come from one of the authors, Mary Amaechi, who is a native speaker of Igbo. The data have been verified with several other native speakers, see the acknowledgements.


It is possible to focus (local) subjects in-situ after all if they are accompanied by a focus-sensitive particle like *sọ̀ọ́sọ̀*, 'only':

(4) Sọ̀ọ́sọ̀ only Òbí Obi hụ̀rụ̀ saw Àdá Ada n'-áhíá. P-market 'Only Òbí saw Àdá at the market.'

In the ex-situ strategy, see (5), the focused phrase occurs in the clause-initial position and must be followed by the morpheme *kà* (which we will identify as a focus marker below). Note that this strategy is also not available for (local) subjects: they occupy the clause-initial position anyway due to Igbo's SVO word order, but they cannot co-occur with the morpheme *kà*. For all other XPs (direct objects, indirect objects, adjuncts) the construction is available.



In the ex-situ strategy focus is thus indicated both syntactically (by a change in the position of the focused element) as well as by morphological means (i.e. by the marker *kà* that follows the focused constituent).

Finally, all grammatical functions in Igbo can be focused by means of a cleft structure. Clefts in Igbo are biclausal: The main clause is introduced by the invariant 3sg nominative personal pronoun *ọ́* followed by the copula *bụ̀* (which usually occurs in identificational copula clauses). It embeds a CP in which focus is expressed by the ex-situ strategy, i.e. the focused XP occurs in clause-initial position and is followed by the morpheme *kà*, see (6). Note that subjects can also be focused in a cleft, even though they can still not co-occur with *kà* in the embedded clause, so we seem to be dealing rather with the in-situ strategy for focused subjects in the embedded clause of a cleft. Note that in contrast to what the English translation might suggest, these clefts in Igbo do not include a relative clause (see Amaechi 2018 for arguments for this analysis).

#### (6) Cleft strategy (Igbo)


The cleft strategy is the only way in Igbo to focus (local) subjects without the need of an additional focus-sensitive particle.

The ex-situ and the cleft strategy can both be applied long-distance (i.e. the focused element can occur in a structurally higher clause than the one to which it is thematically related) and to all grammatical functions, see (7) and (8) for object and subject focus, respectively. The same holds for adjuncts and indirect objects:

(7) Long ex-situ focus (Igbo)


(8) Long clefts (Igbo)


Note that while local subjects cannot be focused with the ex-situ strategy, longdistance ex-situ focus is possible for subjects, see (7). Crucially, however, a long ex-situ focused subject must be accompanied by the marker *kà*, just like (locally and non-locally) focused non-subjects in the ex-situ construction. This observation also provides evidence against the traditional view put forward in the descriptive literature on Igbo that the presence or absence of the morpheme *kà* is driven by the grammatical function of the focused constituent: non-subjects combine with *kà*, while subjects cannot do so. Since long-distance displaced focused subjects have to take *kà* as well, the decisive factor cannot be grammatical function; see below for an alternative proposal.

#### **2.2 Semantic focus types**

Focus expresses the presence of contextually salient alternatives that are relevant for the interpretation of a sentence (Rooth 1985, 1992). According to Zimmermann & Onea (2011) additional semantic and pragmatic factors can come into play and lead to different types of foci. Bazalgette (2015) distinguishes between simple focus (that has no function besides triggering alternatives) and pragmatic focus that is associated with implicatures (e.g. contrast, exclusivity) or with presuppositions (e.g. existence, exhaustivity). Van der Wal (2016) summarizes (and criticizes) tests used in the literature to identify these semantic and pragmatic focus types. We applied some of these tests to Igbo and checked which of the three syntactic focus strategies can be used in which function. The result is summarized in Table 1. In fact, the various syntactic strategies differ in the focus type they (preferably) express.

Space limitations prevent us from illustrating all contexts; we provide two below:

(9) Correction:

a. Statement A: Òbí Obi hụ̀rụ̀ saw Àdá Ada n'-áhíá. P-market 'Òbí saw Àdá at the market.'


Table 1: Usage of the focus strategies. "✓" means that the strategy can be used to express this focus type, "∗" means that the strategy cannot be used in this context.

b. Corrective statement B:


#### (10) Numeral interpretation:


Thus, we can see that the morphosyntactic focus marking strategies differ in the semantic/pragmatic focus types they express.

## **3 Focus marking in questions and the morpheme** *kà*

As noted in Fiedler et al. (2010), focus marking is often not only found in focus constructions but also has other functions. Indeed, the same marking strategies described above for focus can also be found in constituent questions in Igbo. This is not surprising in light of the fact that wh-elements are usually considered to be inherently focused (see e.g. Rochemont 1986, Horvath 1986, Tuller 1986, Beck 2006, Haida 2007). When asking a constituent question in Igbo, the corresponding wh-pronoun can either remain in-situ, be moved to the clause-initial position and must then be followed by the morpheme *kà*, or can be expressed by means of a cleft, see (11) for subject and direct object questions:<sup>2</sup>

#### (11) Question formation strategies (Igbo)


<sup>2</sup> In addition to the strategies listed in (11), Igbo also has other means to form questions, especially the so-called *kèdú*-construction, which shows different properties than the constructions discussed here and is also syntactically very different, viz. potentially a biclausal structure with an embedded relative clause, see among others Ikekeonwu (1987), Ndimele (1991), Nwankwegu (2015), Ogbulogo (1995), Amaechi (2018).

As in focus constructions, the ex-situ strategy is not available for (local) subjects since they can never co-occur with the morpheme *kà* (see 11a). In contrast to (non-wh) focused subjects, however, the in-situ strategy is available for whsubjects even without the addition of a focus-sensitive particle (see 11b). Note further that question formation via the ex-situ and the cleft strategy can also apply long-distance, just like in focus constructions (cf. 7 and 8).

With this background on the formation of term focus and questions, we can discuss the nature of the morpheme *kà* that occurs with non-subjects in the exsitu and the cleft construction as well as with long-distance ex-situ/clefted subjects. We identify this morpheme as a focus marker (a claim also made in Osuagwu 2015) for the following reasons. It is clear that this marker is related to the expression of focus: first, it occurs in sentences that express focus, viz. focus constructions and questions, but not in other Ā-dependencies such as topicalization or relativization; and second, it is syncretic to the disjunction 'or' in Igbo (cf. Nwachukwu 1987), viz. it expresses alternatives. Furthermore, we can exclude that *kà* is a focus-sensitive particle because it is obligatory in the contexts where it can occur (i.e. with non-subjects), it cannot associate with the focused XP at a distance (see 12, *kà* must be left adjacent to the focused constituent), and unlike *kà*, focus-sensitive elements like 'only' precede their associate (see 4).

(12) \* Ònyé who Obi Obi (kà) (foc) hụ̀rụ̀ saw (kà) (foc) nà P m̀gbèdè evening (kà) (foc) n'-áhíá P-market (kà). (foc) 'Who did Òbí see in the evening at the market?'

We conclude that *kà* is a focus marker. Moreover, we also have evidence that it does not realize an inherent focus feature of focused constituents, but rather an element in the left periphery of the clause: It cannot attach to in-situ focus/ wh-elements, cf. (3) and (11b), even though these also bear a focus feature (by assumption). We interpret these results such that *kà* is the exponent of a functional head related to focus (= Foc<sup>0</sup> in the split CP-system, cf. Rizzi 1997 et seq.). This view is supported by the observation that *kà* linearly follows the focused element (occupying SpecFoc) and attaches to whole phrases, not just to single words that are in focus: in (13b) only *áhíá* 'market' is focused, but *kà* cannot attach to it; rather, it has to follow the pied-piped PP that includes the focused element.

	- b. M`bà, no [ N'-áhíá P-market (\*kà) (foc) ochie old ] **kà** foc Òbí Obi hụ̀rụ̀ saw Àdá. Ada 'No, Òbí saw Àdá at the old market.'

## **4 The subject/non-subject asymmetry in focus marking**

Even though the extensive study of focus marking strategies has shown that languages differ remarkably in how exactly focus is encoded, some cross-linguistic generalizations have emerged. In a study of about 20 West African languages (Kwa, Gur, Chadic), Fiedler et al. (2010) find a marking asymmetry between focused subjects and non-subjects in all of the investigated languages:

	- a. NSF cannot or need not be marked syntactically.
		- i. NSF is restricted to in-situ positions (Bole, Duwai, Bade, Ngamo)
		- ii. NSF is not restricted to in-situ positions (Gur; Kwa; Hausa)
	- b. SF must be marked.

In a nutshell, Fiedler et al. (2010) found that while focus marking for nonsubjects is excluded or optional, subject focus must obligatorily be marked by morphological devices (focus markers) and/or syntactic means (displacement, clefting). Skopeteas & Fanselow (2010: 171f) formulate this as an implicative relation: "If a non-canonical structure occurs with focus on non-subjects, it is expected to occur with focus on subjects too". Fiedler et al. (2010) also propose an explanation for the observed asymmetry: They assume that subjects in sentenceinitial position are by default interpreted as topics; in order to overwrite this default interpretation in a focus context "the focused subject will have to be realized in a non-canonical structure, for instance, by means of special morphological markers and/or syntactic reorganization" (p. 249).

Igbo is not included in Fiedler et al.'s (2010) study of focus marking in West African languages, but it is interesting to consider it in light of their findings since it provides us with new insights into the scope of the generalization. Given that Igbo also exhibits subject/non-subject asymmetries in focus marking, as outlined in the previous sections, it is a typical West African language with respect to (14). As for non-subject focus, Igbo also behaves like other West African languages: focus marking is optional here since focused elements can stay in-situ (no syntactic displacement, no morphological focus marking by *kà*); alternatively, morphological and/or syntactic encoding is possible in the ex-situ and the cleft strategies. Subject focus marking does not entirely behave as expected according to (14). Focused (local) subjects do not have to be marked for focus at all: they can never co-occur with the focus marker *kà*; moreover, at least (local) wh-subjects

can occur in-situ without being syntactically displaced in any obvious way, but still the sentence is grammatical. In fact the ex-situ strategy (with displacement and the focus marker) is excluded for (local) subjects. Hence, focus marking on (wh-)subjects is *not* obligatory in Igbo. The only context in which (local) subjects must be "marked" for focus is when they are not wh-pronouns and they occur in-situ: this is only possible if a focus-sensitive particle is added, see (4). In any case, (local) focused subjects are incompatible with focus movement (ex-situ strategy, also involved in cleft formation) and morphological focus marking. The generalization for Igbo seems to be a bit more abstract: focus on subjects needs to be encoded morphosyntactically *in some way* to indicate the difference to an information-structurally neutral affirmative sentence as in (1) where the subject is interpreted as the (default) topic. "In some way" includes not only the regular focus marking strategies (not available for local subjects) but also the occurrence of focus-sensitive particles and wh-morphology (the form of the wh-subject pronoun differs from the non-wh person pronouns and the interrogative sentence with a wh-subject thus differs and can be distinguished from an affirmative sentence). If wh-morphology also counts as a focus marking device, we can explain why wh-subjects can occur in the in-situ strategy without further focus marking devices, while focus subjects in the focus construction need to be accompanied by a focus particle to be able to occur in this construction: without the focussensitive particle attached to the subject, the sentence would be morphosyntactically indistinguishable from an affirmative sentence as in (1). Thus, Fiedler et al.'s (2010) generalization also holds for Igbo if focus marking comprises more than syntactic displacement and the use of focus markers.<sup>3</sup>

## **5 On the source of the marking asymmetry in Igbo**

In the previous section we came to the conclusion that focused subjects in Igbo can occur in the in-situ strategy without any focus marking (at least there is no regular encoding by syntactic displacement or attachment of a focus marker), even though there should be a pressure to encode especially subjects according to the Fiedler et al. generalization. In this section, we will briefly outline what the reason for the absence of focus marking with (local) subjects is. For more details, derivations and supporting empirical arguments, the reader is referred

<sup>3</sup>Aboh (2007) offers a different view on the "exceptionality" of wh-subjects: wh-elements are not necessarily inherently focused. The ex-situ ones are in focus, while the in-situ ones (moved to a low focus position) are not focused at all and hence do not receive focus marking.

to Amaechi & Georgi (2019), where we develop an optimality-theoretic analysis of the marking asymmetry for questions in Igbo. We have argued above that the focus marker *kà* realizes the left-peripheral head Foc<sup>0</sup> . We contend that in the ex-situ strategy and in clefts the focused non-subject constituent undergoes syntactic movement to SpecFoc. That the observed displacement involves movement rather than base-generation is supported by the fact that the dependency exhibits the hallmarks of movement (island-sensitivity, reconstruction).

(15) [FocP XPfoc [Foc' Foc<sup>0</sup> [TP ... [vP ... tXP ]]]]

We can derive the absence of the focus marker *kà* in the in-situ strategy by the following assumption: The head Foc<sup>0</sup> is morphologically realized as *kà* only if an overt (phonologically realized) XP occupies SpecFoc, otherwise Foc<sup>0</sup> remains silent (= contextual allomorphy). Since nothing moves (overtly) to SpecFoc in the in-situ construction, Foc<sup>0</sup> is not phonologically realized. In the ex-situ (and cleft) strategy where focused non-subjects move to SpecFoc, they surface at the left periphery of the clause and are accompanied by *kà* (we will not say more about the structure of clefts here). Since movement for non-subjects is optional, we get optionality in the ex-situ/cleft vs. in-situ strategy. The question that remains is why local focused subjects cannot co-occur with *kà*, not even optionally. We suggest that this is because they have to stay in the canonical subject position SpecT (see Amaechi & Georgi 2019 for empirical evidence); i.e., unlike focused non-subjects, they cannot undergo movement to the minimal SpecFoc position. And since no XP occupies SpecFoc, the head Foc<sup>0</sup> has to remain silent. One piece of evidence for this claim is the observation that subject movement in Igbo triggers a tonal reflex on the verb, but constructions with a preverbal focused subject do not exhibit this tonal reflex. That subjects cannot undergo local movement has been claimed for other languages as well (see among many others Chomsky 1986, Agbayani 1997 on the Vacuous Movement Hypothesis in English). A prominent (but not the only) account for this immobility of subjects is that the movement from SpecT (the canonical subject position in Igbo) to the local SpecFoc position would be too short, which is excluded by an anti-locality constraint (see Abels 2003, Grohmann 2003, Erlewine 2016 and references cited there for this concept). Long-distance movement of the subject (as well as clause-bound movement of non-subjects) covers a greater distance and does not qualify as too short by the definition of anti-locality. Non-subjects and long-distance moved subjects can thus occur in the ex-situ construction (where they trigger the realization of Foc<sup>0</sup> ) as *kà* without any problems.

## **6 Conclusions**

We have described the focus marking strategies in Igbo and the pragmatic contexts in which they are used. Igbo exhibits a subject/non-subject split in focus marking; however, this split partially challenges the generalization by Fiedler et al. (2010) on other West African languages because local focused subjects in Igbo cannot be marked by the regular focus marking devices. We provide an analysis according to which the occurrence of the focus marker *kà* is not solely regulated by pragmatic principles, but rather by an interplay of its high syntactic position, morphological realization rules and a ban on too local subject movement.

## **Abbreviations**


## **Acknowledgements**

We would like to thank Jeremiah Nwankwegu, Gerald Nweya, Basil Ovu, Chioma Eweama and Francis Umunnakwe for verification of the data. For valuable comments we are grateful to the audiences at "Quirks on subject extraction" (Singapore, August 2017), ACAL 49 (Michigan, March 2018), "Referential and relational approaches to syntactic asymmetries" (Stuttgart, March 2018), the syntax colloquium at the University of Frankfurt, and especially to Malte Zimmermann and Katharina Hartmann. This research is funded by the Deutsche Forschungsgemeinschaft (DFG), Collaborative Research Centre SFB 1287, Project C05 (Georgi).

## **References**


van der Wal, Jenneke. 2016. Diagnosing focus. *Studies in Language* 40(2). 259–301.

Zimmermann, Malte & Edgar Onea. 2011. Focus marking and focus interpretation. *Lingua* 121(11). 1651–1670. DOI: 10.1016/j.lingua.2011.06.002.

# **Chapter 7**

# **Focus marking and dialect divergence in Līkpākpáln (Konkomba)**

## Abraham Kwesi Bisilkia,b

<sup>a</sup>University of Education, Winneba, Ghana <sup>b</sup>The University of Hong Kong

In this paper, I discuss some salient aspects of focus marking in Likpakpaln, a Mabia (Gur), Niger-Congo language spoken mainly in the northern parts of Ghana. I compare focus marking in two dialects of Līkpākpáln, namely, Līnàjùúl and Līchábɔ́l. I treat the notion of focus from the angle of Dik (1981). Data draws from a multisource corpora digitally recorded from stimuli-based elicitations and other natural discourse settings. Following the analysis of data, the study reveals that the use of focus particles constitutes the most significant means of focus marking in Līkpākpáln as that focus strategy is shared by both Līnàjùúl and Līchábɔ́l. Also, a common feature for both Līnàjùúl and Līchábɔ́l is that there are syntactic restrictions for the distribution of various focus particles in the sentence. On the question of divergences, I note that sentence final vowel lengthening also assumes a focus function with respect to Līnàjùúl. Also, the focus markers in Līnàjùúl (*ń, ńká* and a sentence final focus particle of varied phonological shapes) differ in form from the focus markers, *lé* and *lá* in Līchábɔ́l. Finally, I suggest that the focus marking differences between Līnàjùúl and Līchábɔ́l possibly stem from the fact that Līnàjùúl appears to have innovated a complex focus system vis-à-vis focus marking in the Mabia languages of Ghana. However, more thorough investigation into focus marking in other dialects of Likpakpaln and Mabia is recommended. This will help establish whether the Līnàjùúl case is an isolate system or not.

## **1 Introduction**

The phenomenon of information structure (IS) and packaging is a sub-domain of linguistics that has received a generous scale of attention from linguists globally. This is as exemplified in works such as Lambrecht (1994), Krifka (2007), Schwabe

> Abraham Kwesi Bisilki. 2022. Focus marking and dialect divergence in Līkpākpáln (Konkomba). In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 123–147. Berlin: Language Science Press. DOI: 10.5281/zenodo.6393744

& Winkler (2007), Ameka (2010), Zimmermann & Onea (2011), and van Putten (2014). Often central in studies of information structure and packaging is the subject of focus. Paradoxically, the more that linguists try to put questions to rest regarding focus phenomena in languages, the more insatiable this topic area becomes. This observation is accentuated by the ever-increasing volumes of focusrelated analyses and counter-analyses that continue to delve into the topic. For instance, the focus status of the post verbal *la* in Dagbani (a Gur, Niger-Congo language spoken in the Northern Region of Ghana) has been the source of a series of somewhat varying analyses as reflected in Olawsky (1999), Hudu (2012) and Issah (2013). Van Putten (2016: 94), similarly, notes the difficulty in attempting to find exhaustive explanations to questions on focus phenomena. She makes this observation in relation to the elusive task that linguists face in trying to determine, for instance, when and why focus marking is resorted to in non-obligatory focus languages. What the foregoing situation clearly suggests is that the need for systematic investigations into the attributes of focus is likely to remain an active area of investigation of focus for linguists, even with regard to the so-called well-researched languages.

Līkpākpáln is a Gur (or Mabia<sup>1</sup> ) language, whose speakers are mostly found in the Northern Sectors of Ghana. Speakers of Līkpākpáln natively term themselves as Bikpakpaam, instead of the exonym, Konkomba, which has often been used as a shared tag for both the people and their language. Some specific areas of their location include Saboba District in the Northern Region, Nkwanta South and Nkwanta North Districts in (Northern) Volta. In other contexts, these areas of the Bikpakpaam location are alternatively referred to as the North-Eastern parts of Ghana (Schwarz 2009: 182). Simons & Fennig (2017) and Simons & Fennig (2018), in *Ethnologue: Languages of the world*, estimate the Līkpākpáln speaker population in Ghana at 831,000, besides other speakers reported of in the Republic of Togo.

Līkpākpáln has a significant native speaker population, yet it is one of the very little-researched languages of Ghana. In the view of Schwarz (2009), the need for basic grammatical descriptions of Līkpākpáln is still very high. This paper contributes to filling the basic knowledge gap on Līkpākpáln by investigating some aspects of focus marking in the language. The study introduces into the literature new data on focus constructions in Mabia. It does so from a comparative perspective by drawing data from two clan dialects of Līkpākpáln, namely, Līnàjùúl and Līchábɔ́l, respectively. The following questions form the crux of this article:

<sup>1</sup> "Mabia" is an alternative name for Gur. The former is becoming a preferred label among native linguists on the Gur languages. I adopt the term Mabia in this article with reference to Bodomo (2017) and Bodomo & Abubakari (2017: 161).


## **2 Some basic grammatical features of Līkpākpáln**

As already indicated, Līkpākpáln is a Mabia language. It is further defined as belonging to the Gurma sub-cluster of the Oti Volta family (Naden 1988, Steele & Weed 1966). This section briefly explicates some linguistic features of Līkpākpáln.<sup>2</sup> This will provide a prerequisite for understanding discussions on focus marking in subsequent sections of the article.

Līkpākpáln is a word order language, with the SVO typology as generally known of Mabia and Kwa languages (Schwarz 2009). A simple sentence in Līkpākpáln can have the pattern SVO, SV or SVA, depending on whether the verb is transitive or intransitive. See the sentences in (1) below:


For purposes of focus, a non-subject constituent can be placed sentence initially, but the subject and the verb will remain in the fixed order of SV in the base clause as in (2b) below:

(2) a. Mánótī Name kɔ́r slaughter.prf ú-kúló. cl-fowl [canonical clause] 'Manoti has slaughtered a fowl.'

<sup>2</sup>The data used in this section are based on Līnàjùúl, but the same features hold for Līchábɔ́l.

b. Ú-kúló cl-fowl ńká foc Mánótī Manoti kɔ́r. slaughter.prf 'It is a fowl that Manoti slaughtered.'

In a ditransitive construction, the indirect object precedes the direct object as can be seen in (3a–b):

	- b. Ú-nìmpū cl-woman wár cut.prf mē 1sg.obj bī-sáá. cl-food 'A woman has served me food.'

Līkpākpáln has a noun class system based predominantly on class affixes, which also bear number semantics. Although prefixes dominate the class markers, some nouns have obligatory prefix-suffix pairs. Fewer nouns take only suffixes, which become the basis for their class assignment. Prefixes have corresponding class pronouns.<sup>3</sup>

The morphology of Līkpākpáln nouns is basically agglutinating. Verbs, on the other hand, have a poor morphology as there are only a handful of aspectual markers on the verb. Tense is a function of preverbal particles. Three tones, high ( ˊ ), mid ( ˉ ) and low ( ˋ ) are identified in Līkpākpáln (Steele & Weed 1966: 16).

The language has an initial orthography (which is reasonably phonemic) that was fashioned based on the Līchábɔ́l dialect. Any sequence of two vowels (whether representing a long vowel or not) in a word is treated as two syllable nuclei that may have the same or varying pitch levels (Bisilki & Akpanglo-Nartey 2017). Tone is generally not marked in the orthography of Līkpākpáln. I, nonetheless, mark tone in this study as tone has both lexical and grammatical functions in the language and is quite relevant in an analysis such as this.

## **3 The current study in perspective**

The notion of focus has been defined in many, related ways. Van Putten (2016: 92) maintains that focus is the part of a sentence that carries the common-ground update. The information that is a shared knowledge between both the speaker

<sup>3</sup>A detailed discussion on Līkpākpáln noun classes can be found in Bisilki & Akpanglo-Nartey 2017 and in Winkelmann 2012.

and the listener in an interlocution constitutes the common-ground. As speakers communicate, they try to increase their common-ground or shared knowledge by introducing and linking new information to this common ground. The new information that is introduced becomes an update to the common-ground and, for that matter, the focal point. My reservation(s) with van Putten's definition, though, is whether or not focus is always solely underpinned by what is necessarily new information. Indeed, Dik (1981: 59) argues that for the purpose of stressing the importance of a certain information or reactivating it in the addressee's memory, a speaker may place focus on such information the speaker knows is not new to the addressee. Similarly, Skopeteas et al. (2006: 3) hold that a given element may be focused. A given element, in the view of Skopeteas et al. (2006: 2) refers to information that the speaker believes the addressee already knows.

Consequently, in the present analysis, I treat the notion of focus from the point of view of Dik (1981) and as subsequently in Dik (1997). Focus represents what is relatively the most important or salient piece of information in a given discoursecontext (Dik 1981: 42). Relatedly, a constituent of focus function is assumed to present information bearing upon the pragmatic information difference between the speaker and the addressee as perceived by the speaker. The foregoing conceptualisation of focus replays in Dik (1997: 326) when he sees the focal information in a linguistic expression to be the most essential or salient in a given communicative context and considered by the speaker to be the most relevant for the listener to integrate into his/her pragmatic information. From this point of view, one can say further that a focus construction is a construction in which a particular constituent (i.e., the focal constituent) is placed in relative prominence or saliency by setting it off from the rest of the sentence or utterance in one way or another (Boadi 1974, Drubig & Schäffer 2001, Marfo & Bodomo 2005: 185). In terms of the expressive devices or strategies that languages deploy in marking focus, Dik (1981: 43) stipulates four ways:


Focus in different languages may use some or all of these devices in different combinations. Along a functional line, Dik (1981: 60) typologises focus broadly as either +contrast or −contrast. −Contrast focus is also termed as completive or informative focus whereas +contrast focus is also known as contrastive focus. Akrofi Ansah (2014), Schwarz (2009) and Skopeteas et al. (2006) further delineate into finer-grained focus types: selective, expanding, restricting, replacive and parallel. Focus is completive (−contrast) when it serves merely to emphasize (or make prominent) a particular constituent, but contrastive when it contrasts the information of a constituent with that of another. The Līkpākpáln (Līnàjùúl) sentences in (4b) and (5b) illustrate completive and contrastive focus respectively.

	- b. Ú 3sg.sbj bì-trí prog-push lóól car lέ. foc 'He is pushing a car.'
	- b. Dábí, No, lóól car ńká foc ú 3sg.sbj bì-trí. prog-push 'No, he is pushing a car.'

The discourse function of focus in (4b) is simply to lay emphasis or prominence on a car as the constituent bearing the relatively most salient information in the predication. On the other hand, the focus in (5b) serves to show the contrast that it is a car (and not a horse) that the man is pushing.

Focus can also be broad or narrow depending on whether it is assigned to the entire sentence (or its truth value) or a particular constituent or complement (Dik 1981: 44). See the Līkpākpáln (Līchábɔ́l) sentences (6) and (7):

	- b. Bū-sūb cl-tree lé foc lír. fall.prf 'A tree has fallen.'
	- b. Ní 3sg yí is lɔ́r car lá. foc 'It is a car.'

(6b) presents an instance of broad focus where the entire sentence serves to fill the information gap in the knowledge of the listener (addressee). (7b), however, exemplifies narrow focus as a car is the only focus bearing constituent in the sentence. More on broad and narrow focus can be found in Hyman (2010: 96– 97).

Additionally, this study also draws on van Putten (2016: 93) in the light of the contradistinction between a focused or in focus constituent, on one hand and, on the other hand, a focus-marked constituent. The former is applicable to a situation where an element that constitutes the most crucial point of information is only so understood pragmatically without the use of overt linguistic devices. The latter case has to do with the situation where a focal element is explicitly marked for focus by any of a possible range of linguistic devices that may have a focus configuration function in a language. Another fact worth noting is that focus-marked elements are invariably in focus whilst an element can be in focus without necessarily being focus-marked. In the present discussion, my concentration is mainly on cases of focus-marked constituents as the presentation and analysis of data will show.

As already indicated (in §1), the current analysis investigates some aspects of focus marking in Līkpākpáln from a dialectal perspective. This is in the sense that the study does not only describe focus marking in Līkpākpáln, but it also compares two actively spoken dialects of the language (Līnàjùúl and Līchábɔ́l), with respect to the phenomenon in question. There has been a preliminary attempt at investigating focus in Līkpākpáln by Schwarz (2009). Nevertheless, Schwarz's study was limited to only Līchábɔ́l. Focus marking in Līchábɔ́l is here being reexamined and compared with focus marking as pertains in Līnàjùúl (which has no such previous study).

Beyond the agenda of providing linguistic description of Līkpākpáln, the immediate motivation for this study is also anchored in two issues. The first is to help settle some questions regarding the curiosity that is engendered by a constant, but cursory refrain in the few works on Līkpākpáln that the language is highly split into numerous clan dialects (Schwarz 2009: 182, Hasselbring 2006:

107). Although scholars have often been quick to point out that Līkpākpáln subdivides into numerous dialectal forms along clan units, the state of linguistic convergence or divergence between the supposed variants of Līkpākpáln remains unexplored or, at best, little-researched. Secondly, there are currently proposals being made by the Līkpākpáln speaker community in Ghana to re-design an orthography that will have a more unified outlook for various speakers of the language. I am privy to this initiative as a native speaker of Līkpākpáln and member of the speaker community. Such a practical need further calls for studies that potentially reveal how similar or different the dialects of Līkpākpáln spoken by various Bikpakpaam clan groupings are. I look at focus marking in Līnàjùúl (see §5) first, and then focus marking in Līchábɔ́l (see §7) before proceeding to compare the focus systems of the two in §8.

## **4 Data collection method**

This study is based mainly on primary data sets collected from the native speakers of Līnàjùúl in the Nkwanta North District of (Northern) Volta and Līchábɔ́l speakers of Saboba in the Northern Region of Ghana. I used both observation (including participant and non-participant types) and direct elicitation techniques for data collection. The direct elicitation involved four consultants (two Līnàjùúl speakers and two Līchábɔ́l speakers; one male and one female for each dialect) purposively selected. The observation data covered varied communicative domains such as during arbitration proceedings at the chief's court, religious ceremonies and family interactions.

With the direct elicitations, the prompts were a 10-minute video-clip, on one hand, and picture stimuli (some original; others adapted from Skopeteas et al. 2006), on the other hand. Chelliah (2013: 61) attests to the advantage in using non-linguistic stimuli tasks such as video-clips and photographs. As Chelliah (2013) puts it, "Non-linguistic stimuli have several advantages: speakers do not require special training to understand the tasks and responses are clearly linked to stimuli and are, therefore, less ambiguous." The video and picture stimuli were designed based on local content in the Līkpākpáln speaker environment. For instance, I took pictures of different animals at different times, pictures of people engaged in different activities (e.g., during block laying at a construction site, cooking, etc.). The essence of using familiar stimuli was to avoid the situation where culturally foreign stimuli could lead to consultant confusion. A further benefit from the use of stimuli was that by taking the responses of Līnàjùúl and Līchábɔ́l speakers to the same prompts allowed for making easy contrasts between the two dialects (see Majid 2012: 56).

Using the information structure questionnaire (QUIS, Skopeteas et al. 2006) as a guide, I also sometimes posed content questions to which consultants responded based on the stimuli. The use of question-answer pairs as a standard heuristic for determining focus constituents is also well established in the literature (see for example Dik 1978, Krifka 2007, Watters 1979). Utterances were recorded with a digital video device. With the aid of Elan (4.9.4), the recorded speech was segmented and transcribed for analysis.

## **5 Focus marking in Līnàjùúl**

Focus marking in Līnàjùúl requires the use of special particles dedicated for marking focal elements. The use of particles for focus marking is also sometimes described as morphological (Childs 1997, Hartmann & Zimmermann 2009, Schwarz 2009, van Putten 2016). Rochemont (1986) equally demonstrates the use of prosodic resources and syntactic means, respectively in focus assignment. The focus particles in Līnàjùúl include: *ń*, *ńká* and a sentence final focus particle that assumes varying shapes, depending on the sentence final consonantal involved (this is discussed in detail in §5.3).

#### **5.1 Focus particle** *ń*

The particle, *ń* is employed to focus-mark constituents in the utterances of Līnàjùúl speakers. It is worthy to note that the *ń* particle anticipatorily undergoes homorganic assimilation, giving it other variants as *ḿ* and *ŋ́* in speech:

	- ii. Mákīnyì Name ḿ foc bì-sìí prog-insult ū-pú. poss-wife 'Mákīnyì is insulting his wife.'
	- b. Ú-nìmpū cl.sg-woman ŋ́ foc kpá have kī-nyɔ́k. cl.sg-mouth 'A woman is talkative (in a quarrel).'
	- c. Ú-bɔ́r cl.sg-chief ń foc yì own.hab kī-tìŋ. cl.sg-land 'The chief owns the land.'

d. Bīmá 3pl.sbj ń foc yór take.pfv ń-dàn. drink 'They took (to take away) the drink.'

From the examples in (8), it can be noted that ń is used to mark focus on sentence initial subjects. *Ń* in its sentence initial subject focus constructions is restricted to the immediate post subject slot before the canonical verb. Apart from simple subject constituents (nouns and pronominals) as shown in (8) above, *ń* can also be used to place focus on complex subject NPs as the sentences in (9) reveal:


b. Bī-nìnkpíí-b cl.pl-elder-cl.pl bī-tī-ká-nà 3pl-loc-sit-def ḿ foc bán want.ipfv ń-dàn. cl-drink 'The elders sitting over there want a drink.'

*Ń* as a focus particle cannot be placed in an intervening position in complex focal NPs, but comes immediately after the last complement of the complex NP (i.e., it is placed at the rightmost edge of the focus phrase [FocP]). The subject focus role of Līkpākpáln *ń* makes it analogous to a similar subject focus marker (*ń*) in Dagbani and Gurenɛ (Kropp Dakubu 2003: 4, Issah 2013: 169, Issah & Smith 2018: 5, Akrofi Ansah 2014: 169). The Dagbani and the Gurune data in (10a) and (10b) respectively confirm this observation:


The deletion of *ń* from a sentence in Līnàjùúl, nevertheless, does not render such a construction ungrammatical. As such, any of the sentences in (9) above can be re-presented grammatically as in (11), except that these sentences become neutral in their contextual meanings:

7 Focus marking and dialect divergence in Līkpākpáln (Konkomba)

	- b. Bī-nìnkpíí-b cl.pl-elder-cl.pl bī-tī-ká-nà 3pl-loc-sit-def bán want.ipfv ń-dàn. cl-drink [neutral] 'The elders sitting over there want a drink.'

*Ń* cannot be used to mark focus on non-subject constituents. As earlier indicated, an attempt to re-position *ń* in any part of the sentence different from the immediate post canonical subject slot results in ungrammaticality of the sentence. This accounts for the unacceptable forms in (12):<sup>4</sup>

	- b. \* Bīmá 3pl.sbj yór take.pfv ń-dàn drink ń. foc 'They took (to take away) the drink.'
	- c. \* Ń foc Bīmá 3pl.sbj yór take.pfv ń-dàn. cl-drink 'They took (to take away) the drink.'

*Ń* equally serves both +contrastive and −contrastive focus functions. The specific context of utterance determines whether *ń* is used for emphasis or to code a meaning of contrast.

#### **5.2 The particle,** *ńká* **as a focus marker**

*Ńká* is used to focus-mark only fronted non-subject constituents. In this case, a focus phrase (i.e., comprising both the focus particle and the focal constituent) must be placed extra-clausally. Extracting the focus particle only or the focal constituent only leads to a distortion of the grammaticality of the sentence. Although Līkpākpáln is not a Kwa language, the requirement that *ńká* necessarily collocates with its focal target in the extra-clausal position falls in with Ameka's (2010) observation that in some Kwa languages, both a focus particle and the focalised element must be placed together in a fronted position. *Ńká* can be used to focus-mark objects as the sentences in (13) show:

<sup>4</sup>Please note that "\*" in front of an item means ungrammatical/unacceptable form.

	- b. Tī-kpēn cl-soup ńká foc ú-nìmpū cl-woman ká sit ŋáándέ. boil.ipfv 'A woman is preparing soup.'
	- c. Ú 3sg.sbj bī-nyɔ́ prog-take ń-dám. cl-drink [canonical] 'S/he is taking (drinking) a drink.'
	- d. Ń-dám cl-drink ńká foc ú 3sg.sbj bī-nyɔ́. prog-drink 'S/he is taking (drinking) a drink.'

Also, the sentences (14b) and (14d) below provide instances of *ńká* marking focus on an adjunct and an adpositional respectively.


With reference to the sentences cited so far, one would also realize that it stands to say that fronting constituents for focus assignment with *ńká* does not trigger a resumptive pronoun in the base clause. Syntactically, *ńká* takes the slot immediately after its focal host, but must also precede the subject argument in the canonical clause position, which can either be a pronominal or a lexical subject.

Unlike the *ń* focus marker, a deletion of the *ńká* particle from a focus construction renders it ungrammatical, unless such a deletion is concomitant with a re-positioning of the focal constituent in its base position (in situ). As such, (14b) and (14d) become ill-formed constructions as in (15a) and (15b) below:

	- b. \* Kī-sáá-k cl-farm-cl nē in ú 3sg.subj bī-kɔ́r. prog-weed 'S/he is weeding inside the farm.'

Nevertheless, (15a) and (15b) would be well formed if there were in situ object placement alongside the deletion of *ńká*. Hence, (15a) and (15b) as re-presented in (16a) and (16b) stand as grammatically correct sentences:

	- 3sg.subj prog-weed cl-farm-cl in 'S/he is weeding inside the farm.'

Discourse contextually, it was observed that *ńká* is mostly used for a contrastive focus function. It appears that when a non-subject constituent is to be focused −contrastively, a sentence final particle (discussed in §5.3) is preferred while the other way around calls for *ńká*.

#### **5.3 Sentence final focus particle**

There is a phenomenon in Līnàjùúl where a focus particle is placed sentence finally for the marking of focus, mostly, on post verbal constituents. This is as shown in the sentences in (17) below:

	- b. Ú-pìì cl-sheep bī-ŋmáán prog-chew bī-sū-b cl-tree-cl áá-fár gen-leaves rέ. foc 'A sheep is chewing leaves of a tree.'
	- c. Ńtáánáá Name chá go.ipfv kī-sáá-k cl-farm-cl kέ. foc 'Ńtáánáá is going to the farm.'
	- ii. Chákún Cat dɔ́ lie-ipfv lī-jà-l cl-chair-cl tàáb under bέ. foc 'It is lying [under a chair]foc.'

While *έ* remains invariant in all instances of the sentence final focus particle, the consonants have varied, depending on the final consonant segment(s) in the sentence final word of the constituent in focus. This, therefore, means that the sentence final focus particle is constructed by retaining a sentence final consonant (where sentence final consonant refers to the word-final consonant before the focus particle) and adding *έ* to it. One may then state that the shape of a sentence final focus marker in Līnàjùúl is phonologically conditioned. The influence of phonological environment on the choice of focus particles is also found in Sissali (a sister Mabia language spoken in Upper West Ghana). Dumah (2017) shows that in Sissali, when a focal constituent ends in a consonant, *nέ* is used for focus while *rέ* is used where such a constituent ends in a vowel. The following Sissali sentences in (18) from Dumah (2017) illustrate the phenomenon:

	- a. Gyinaŋ<sup>i</sup> Today nέti/\*rέ foc Dùmà Dùmá sί fut gύnnὶ learn wὺjίŋ. lesson 'Today (and not any other day) that Dùmà will learn a lesson.'
	- b. Daari<sup>i</sup> Name rέ/\*nέti foc yↄ́bↄ̀ buy tèŋ. book 'Daari (and not any other person) has bought a book.'

The inappropriateness of \**rέ* in (18a) is because the focused constituent ends in a consonant and the reverse accounts for \**nέ* in (18b). However, in the case of Līnàjùúl, when a post verbal focal constituent ends in a vowel, a focus particle is not used. Instead, there is an increase in the duration/extra lengthening of the final vowel (although this still requires an acoustic investigation to be more formally established).

The sentence final focus particle can be used to focalise both simple and complex non-subject constituents, including even entire VPs as can be seen from the examples in (17). (19) specifically illustrates VP focus with the sentence final focus particle.

	- b. Ú 3sg.sbj jɔ́n climb.pfv bī-sū-b cl-tree-cl bέ. foc 'He [climbed a tree]foc.'

In (19b), we see a sentence final particle, *bέ* used to mark focus on an entire VP. The scenario is that the speaker (19a) saw the addressee (19b) knock her son (addressee's son) on the head. This prompted the speaker's question, leading to the addressee's response (19b) in which the entire VP structure is in focus. It must also be reiterated that the sentence final focus particle mainly has a −contrast discourse function. Thus, it serves more to give relative emphasis or prominence to a particular constituent rather that to contrast.

Also, the non-use or the deletion of a sentence final focus particle does not make a sentence ungrammatical. In this sense, sentence final focus particles behave like particle *ń* discussed in §5.1. The sentences in (20) are a representation of (17a–b), except that they are now neutral forms.

	- b. Ú-pìì cl-sheep bī-ŋmáán prog-chew bī-sū-b cl-tree-cl áá-fár. gen-leaves 'A sheep is chewing the leaves of a tree.'

Thus, in (20) we find that the sentences with sentence final focus particle in (17a–b) are represented as grammatical forms without the focus markers.

## **6 Any combinatorial permissibility between the Līnàjùúl focus particles?**

A careful analysis of the Līnàjùúl focus particles affirm that, to a large extent, they have a complementary distribution in clauses or sentences. A co-habitation of any two of the focus particles in the same clause or even respectively in conjunct clauses usually results in a grammatically weird form as can be seen in the examples in (21) below.

	- b. \* Kónjà Name ń foc pēn borrow-pfv í-līk cl-money kē conj kūn farm.pfv kī-sáák cl-farm-cl kέ. foc 'Kónjà borrowed money and used it to make a farm.'

(21a) is a simple clause while (21b) is a compound clause, yet a concurrent hosting of two focus particles is unacceptable in any of the cases.

## **7 Focus marking in Līchábͻ́l**

Two particles, *lé* and *lá*<sup>5</sup> have been identified as the focus markers in Līchábɔ́l (Schwarz 2009). In Schwarz's study, emphasis was more on establishing the divergent status of *lé* and *lá* in the Līchábɔ́l grammatical system. Schwarz appears to have ended on the following key conclusions, inter alia:


<sup>5</sup> In the present analysis, I might not have covered all the aspects of *lé* and *lá* that were dealt with in Schwarz (2009), even though I might have also introduced some new perspectives on these focus particles here. The reason is that this paper's interest is more in two issues: 1. articulate focus marking differences between Līnàjùúl and Līchábɔ́l, and 2. to address any gap(s) that were observed in Schwarz's analysis.

Much as I accede to Schwarz's arguments, one of my points of disagreement lies with his claim about the non-obligatoriness of *lé* and *lá* in the sentence (Schwarz 2009: 184–185). In my observation, it is only *lá* which is possibly nonobligatory in every context of its use as a focus particle. *Lé*, on the other hand, has an obligatory use in the case of certain pronominal subjects (which I tentatively typologize as strong, disjunctive pronouns) and also in a situation where a non-subject is placed sentence initial as data in (22) and (23) suggest. Furthermore, a new dimension that I offer in the present analysis is that *lá* can also be used to lay a special emphasis on the entire proposition of a clause, rather than on only constituents within the clause. This is illustrated in example (23).

#### **7.1 Particle lé**

When marking focus on a focal subject, both *lé* and the focused constituent are located within the canonical clause as can be seen in (22a), (22b), and (22c) respectively. However, when it is a non-subject focal constituent, both particle *lé* and a focalized element are fronted as in (22e):

	- b. Ú-píí cl.sg-woman lé foc kpá have b-ūmͻ́-b. cl.sg-mouth-cl 'A woman is talkative (in a quarrel).'
	- c. Úmáá 3sg lé foc nyún drink.pfv ń-dáán. cl-drink 'He is the one who took a drink.'
	- d. Ń 1sg.sbj wáá see.ipfv ú-pìì. cl-sheep 'I see a sheep.'

[canonical sentence]

e. Ú-pìì cl-sheep lé foc ń 1sg.sbj wáá. see.ipfv 'A sheep is what I see.'

Noteworthy is that whether in subject or non-subject focus, *lé* invariably occupies the immediate slot after the focal constituent as can be seen from (22). *Lé* is not placed in an intervening position within the complements of a focal constituent (i.e., in the case of a complex constituent), even when it is used for a

narrow focus on only a part of the complex constituent, illustrated in (23). The question (23a) shows that the focus is narrowed to only *ŋì-lé* (two). Yet the placement of the *lé* focus marker (23b) remains positioned in the same place as would be the case if the entire NP, *ŋì-tà ŋì-lé* 'two tyres' were in focus:

	- b. (ŋì-tà) cl-tyre ŋì-lé two lé foc pú. spoilt 'Two (tyres) got spoilt.'<sup>6</sup>

Contrary to Schwarz (2009), *lé* is found to be obligatory in certain focus conditions. This occurs when certain strong, disjunctive pronouns take the subject position and also when a non-subject constituent is moved to the left periphery. The examples (24) further illustrate the use of *lé*:

	- b. Min 1sg.sbj lé foc ŋmán eat.pfv ŋí-tùùn. cl-beans 'I ate beans.'
	- c. Tìmīn 1pl.sbj lé foc jín eat.pfv bī-sáá. cl-food 'WE ate (the) food.'

It is as a result of the obligatory status of *lé* in contexts such as (24) that the sentences in (24) become ungrammatical as re-presented in (25) below.

	- b. \* Min 1sg.sbj ŋmán eat.pfv ŋí-tùùn. cl-beans 'I ate (the) beans.'

<sup>6</sup> (23b) is adapted from Schwarz (2009: 187).

c. \* Tìmīn 1pl.sbj jín eat.pfv bī-sáá. cl-food 'We ate (the) food.'

*Lé* (25a) becomes necessary because of the fronted object (a non-subject constituent). Similarly, *lé* is indispensable (25b and 25c) because of the particular pronominal subjects involved.

#### **7.2 Particle** *lá*

The particle, *lá* as a focus marker in Līchábɔ́l is constrained to sentence final position in a similar way as the sentence final focus particle in Līnàjùúl. In its post canonical verb position, *lá* is immediately postposed to the constituents that it focus-marks. That is, the element in focus precedes *lá* in terms of nearness to the canonical verb. *Lá* can be used to mark focus on any non-subject constituent as can be noted (26) below:


*Lá* (26a-ii), is used to mark focus on an adverbial. *Lá* marks focus on an adjective and (26c-ii) marks focus on a complex VP.

Abraham Kwesi Bisilki

Furthermore, *lá* also occurs when the element of focus is just the verb. (27) is an example to this effect:

(27) a. Lá q bī 3pl.sbj dá buy.pfv ídɔ́? wood 'Where did they buy the (fire)wood?' b. Bī 3pl.sbj sūn steal.pfv lá. foc 'They stole it.'

Additionally, Schwarz (2009) hints at the fact that *lá* can be used to add a kind of emphasis to the meaning of a focal constituent. A further discovery the present study brings on board is that such emphasis by *lá* can also apply to the meaning of the entire sentence. This is observed to happen when, in discourse, a speaker wants to be sarcastic or, in earnest, indicate that the idea or situation being stated is beyond the ordinary. An example (28) below illustrates this:

(28) Jàgrì Jàgrì kpɔ́ has ŋì-mɔ́bìl cl-money lá. foc '[Jàgrì has money]foc.'

The discourse-contextual interpretation of the sentence (28) is not to emphasize or contrast only a portion of the sentence. Rather, the contextual meaning is that 'Jàgrì is, indeed, rich or he is richer than the ordinary.' One must also note that in cases like (28), *lá* is still retained in the sentence finally. An interesting commonalty about every context use of the *lá* focus marker, is its optionality in the sentence. Hence, example (28) and (27b) are still grammatically correct (although their contextual meanings may become inappropriate) without *lá* (29).

	- b. Bī 3pl.bj sūn. steal.pfv 'They stole.'

Finally on *lá*, Schwarz (2009) acknowledges that there are similar particles like *lá* in Līchábɔ́l, but with different functions. Possibly, a more appropriate way to put this is to say that there are homophonous *Lás* in Līchábɔ́l-Kpakpaln. There is a focus marking *lá* and there is also an interrogative particle *lá*, meaning roughly "where" (see, for instance, data example 27a).

## **8 Highlights of focus marking divergences between Līnàjùúl and Līchábɔ́l**

The foregoing discussions (in sections above) reveal that Līnàjùúl and Līchábɔ́l have intriguing similarities as well as differences, with respect to the phenomenon of focus marking. In the first place, the two dialects use special focus markers (in this case focus particles) for marking focal constituents. To that extent, both Līnàjùúl and Līchábɔ́l conform to the common linguistic phenomenon, where the focus systems of Mabia languages involve the use of focus marking particles. Nonetheless, whereas Līnàjùúl has three particles (*Ń, ńká* and a clause final particle of varying shapes), *lé* and *lá* are the only particles used for coding focus in Līchábɔ́l. However, Līnàjùúl further appears to draw on the prosodic feature of duration/sentence final vowel lengthening for focus assignment (see §5.3), whereas this does not occur in Līchábɔ́l. This means that while focus marking remains mainly morphological in Līchábɔ́l, Līnàjùúl has both morphological and prosodic strategies for marking focus.

There is the temptation to state that the focus marking differences between Līnàjùúl and Līchábɔ́l owe to the fact that Līnàjùúl has innovated a more complex focus system, while also bearing decadence in that regard. This comes up somewhat clearly when one considers the focus marking system of Līnàjùúl visa-vis the larger Mabia framework. It can be said, for instance, that the use of prosody and a phonologically conditioned sentence final focus particle of varying shapes is currently not known to be prevalent among the Mabia languages. What comes close to the latter case in Līnàjùúl is the occurrence in Sissali where the focus markers, *nέ* and *rέ* alternate depending on whether the focalized constituent ends in a consonant or a vowel (Dumah 2017). Also, *lá* which is a prevalent focus particle in the focus systems of many Mabia languages of Ghana, such as Dagbani, Dagaare, Moore, Kusaal, Mampruli (Bodomo 1997: 93, Kropp Dakubu 2003, Issah 2013, Saanchi 2005) is synchronically not used for focalization in Līnàjùúl. The only trace of *lá* in Līnàjùúl is its use as a question particle (see example 17f).

There is a possibility that the *lá* focus marker existed in Līnàjùúl at a certain point, but only synchronically got lost due to linguistic evolution over time. The innovations presently noted in the Līnàjùúl focus system correlates with a pattern recently found with its noun class system (Bisilki & Akpanglo-Nartey 2017). In a study of Līnàjùúl noun classes, Bisilki & Akpanglo-Nartey (2017) similarly found that Līnàjùúl is evolving further away from the prototypical Gurma noun class characteristics.

It must, however, be indicated that despite the attested focus marking incongruences between Līnàjùúl and Līchábɔ́l, no comprehension challenges obtain

Abraham Kwesi Bisilki

between the native speakers of these variants. Thus, the degree of mutual intelligibility between Līnàjùúl and Līchábɔ́l is high enough to warrant smooth intercommunication between their respective speakers.

## **9 Conclusion**

I have examined some salient aspects of focus marking in Līkpākpáln in this article. In particular, I have discussed focus marking strategies and the syntax of focus constructions in the language. Of more interest in the article is the comparison of the focus marking systems of the Līnàjùúl and the Līchábɔ́l dialects of Līkpākpáln. The comparison (see §8) has revealed some generic similarities, but more intriguing divergence in the shape and number of focus particles. Within the Mabia focus systems, Līnàjùúl was also found to be a bit more diversified by using phonologically conditioned sentence final focus markers. Yet this finds a kind of analogous pattern in a sister Mabia language, Sissali (see §5.3), which uses phonologically conditioned focus markers. Another point of dissociation with Līnàjùúl is the non-focus function of *lá*, which is a common focus marker in several Mabia languages.

Finally, I recommend that investigation of focus marking in other dialects of Līkpākpáln be undertaken. This will help establish whether the focus system of Līnàjùúl is truly an isolate innovation or the pattern is a shared linguistic tendency in Līkpākpáln. Similarly, more thorough studies into the phonological possibilities in focus marking in Mabia, needs to be pursued. Both the cases of Sissali and the Līnàjùúl dialect of Līkpākpáln raise this interest.

## **Abbreviations**

This article adheres to the Leipzig Glossing Rules, with the following additions:


## **References**

Akrofi Ansah, Mercy. 2014. Information packaging: Focus marking and focus constructions in Leteh (Larteh). *Nordic Journal of African Studies* 23(3). 162–179.


Bodomo, Adams. 1997. *The structure of Dagaare*. Stanford, CA: CSLI Publications.


# **Chapter 8**

# **Edges and extraction: Evidence from Chichewa**

Kenyon Branan & Colin Davis MIT

> A growing body of work argues that Agree has the effect of "unlocking" certain domains (*phases*) such that otherwise illicit extraction from them becomes permitted. However, existing proposals disagree on whether Agree is in fact always required to Unlock phases for extraction, or only required for extractions that would otherwise bypass the phase edge. We argue that constraints on extraction from DP in Chichewa reported in Mchombo (2004, 2006) provide evidence for the latter theory, which privileges phase edges. We go on to show that this theory makes correct predictions about Dinka, the language Van Urk & Richards (2015) originally argued provides evidence for the opposite conclusion.

## **1 Introduction**

In this paper, we argue that Chichewa (Bantu) informs us about the conditions on movement in syntax. A small body of work has argued that one such condition involves Agree: that is, certain domains (*phases*) must be targeted by Agree and hence *Unlocked* before a movement operation can extract something from them, at least some of the time. As it stands, there are two theories about when Unlocking is necessary for extraction:

1. *either/or* (Rackowski & Richards 2005, Halpert 2016, 2019, Branan 2018):

Extraction from a phase requires Agree to Unlock that phase, unless the element to be extracted can move via the phase edge.

2. *both/and* (Van Urk & Richards 2015 on Dinka):

Extraction from a phase always requires the containing phase to be Unlocked through Agree, and for extraction to pass through the edge.

While theory #1, the *either/or* theory, predicts extraction to be permitted either by moving through the phase edge or by Unlocking the phase, theory #2, the *both/and* view, predicts that both are prerequisites for extraction.

In this work, we argue that patterns of extraction from DP in Chichewa (Bantu) provide evidence for theory #1, which privileges edges. As we will see, the Chichewa patterns make it evident that extraction really is possible from the edge of a "locked" phase, which has not been agreed with. §2 provides more background on theories of extraction from phases and Unlocking. §3 provides the relevant facts from Chichewa, which in §4 we apply to the question of determining which Unlocking theory is correct. In §5 we show in more detail how this account captures the facts in Chichewa and extends to correct predictions about Dinka (Nilo-Saharan), which Van Urk & Richards originally argued mandates in favor of theory #2.

### **2 Background: How to escape a phase**

#### **2.1 Phases and the edge escape hatch**

An influential idea in contemporary syntactic theory is that the derivation proceeds in phases (Chomsky 2000, 2001; a.o). The set of phases is at least CP, vP, and likely DP as well, DP being our focus in this paper. In essence, Chomsky argues that phases constrain the syntactic derivation because operations outside of a phase cannot target elements within that phase, with one exception: phrases in the edge (specifier) of a phase remain accessible.

We see this system schematized in (1), where movement of XP directly out of the phase's complement is impossible. However, XP can escape the phase if it stops in the edge of the phase first:

(1) Must exit a phase via its edge

[ …… [ℎ …… XP ] ] ✘

This is one hypothesis about the conditions on escaping phases, which explains why syntactic operations seem to have a cyclic, local, punctuated character.<sup>1</sup>

#### **2.2 Unlocking by Agree**

A distinct method of phase escape is proposed by Rackowski & Richards (2005), Van Urk & Richards (2015), Halpert (2016, 2019), and Branan (2018). These works argue that Agreeing with a phase Unlocks it, making it transparent for extraction. For proponents of Unlocking theory #1 previewed in the introduction, this Unlocking allows extraction to bypass the phase edge, which would otherwise be impossible. This is illustrated in (2), where the probe P Agrees with PhaseP, permitting XP to escape PhaseP without stopping in its edge.

(2) Agree with a phase allows bypassing of the phase edge [ …P … [ℎ …… XP ] ] Agree → PhaseP

Branan (2018) presents evidence for this theory based on cross-linguistic restrictions on extraction from DP, where extraction from nominals seems to be contingent on whether or not the nominal controls agreement morphology. An example of this comes from Northern Ostyak (Uralic, Nikolaeva 1999). Northern Ostyak has obligatory agreement with subjects, but agreement with objects is optional. In this language extraction of possessors from subjects is always possible. However, extraction of possessors from objects is possible only when the object is agreed with, as the contrast between (3) and (4) shows.

(3) Possessor extraction from object is impossible without agreement \***Juwan** John motta before [ xot-əl house-3sg \_ ] kǎśalə-s-əm. see-t-1sg 'I saw John's house before.'

<sup>1</sup> For Chomsky, the accessibility of edges is attributed to the nature of spellout: a phase head spells-out its complement, thus sending it to PF and LF, and out of the syntactic derivation. Because the specifier and head of a phase are outside of the phase's complement, they remain accessible. The view of phases for which we present evidence suggests a different reason for the accessibility of phase edges, which predicts that material deeper within a phase is not inaccessible, but rather more difficult to access due to locality constraints on probing.

(4) Agreement with object permits extraction of possessor **Juwan** John motta before [ xot-əl house-3sg \_ ] kǎśalə-s-e:m. see-t-sg.**3obj** 'I saw John's house before.'

In (3) we see that possessor extraction from a direct object is not possible on its own. Rather, as (4) shows, such extraction requires the direct object to be agreed with. Under the theory we argue for today, the extraction in (3) would have been grammatical if the possessor could extract via the DP edge, but evidently such a derivation is not available. (See Branan 2018 for details about how such extraction is constrained). Consequently, Agree with the object is necessary to permit extraction. This is because Agree has the effect of Unlocking DPs, as this pattern in Northern Ostyak and similar ones across other languages indicate.

### **2.3 Two versions of Unlocking theory**

As previewed in §1 above, two variants of Unlocking theory have been proposed. Rackowski & Richards initially proposed Unlocking theory #1, which affords "escape hatch" status to phase edges, in order to connect Unlocking to an understanding of locality and successive-cyclic movement through phase edges. Later evidence for Unlocking has been indirect enough to be compatible with a reformulation eventually offered by Van Urk & Richards' (2015) work on Dinka, which we've introduced as theory #2. We argue that Chichewa verifies the predictions of theory #1, and we go on to show that Dinka actually behaves as we would expect if theory #1 is really the correct choice.

## **3 Extraction and agreement in Chichewa**

Mchombo (2004, 2006) discusses how the formation of discontinuous DPs in Chichewa is constrained by the distribution of the Object Marker (OM), a morpheme in the verbal complex which expresses the -features (in particular for Chichewa, the noun-class features, here represented numerically) of an internal argument. We hypothesize that these discontinuous DPs are derived by movement, and we will argue in this section that the OM involves a probe that Agrees with the -features of an internal argument DP. Importantly for our proposal, certain correlations hold between the possibility of extraction from DP, and whether or not the OM Agrees with that DP.

#### **3.1 When extraction and agreement go together**

Mchombo shows that in the basic case, the OM is optional in Chichewa. However, under certain circumstances the OM becomes necessary. For instance, the OM is required when extracting an adjective, as the contrast between (5) and (6) shows. In (5), the OM reflects the features of the class 4 object DP (*lions*) out of which the adjective (*aged*) has been fronted:

(5) Adjective extraction possible with OM (Mchombo 2004: ex. 21b) **Yókálamba 4sm-aged** anyání 2-baboons a-na-**í**-gúl-ílá 2sm-pst-**4om**-buy-appl-fv makású 6-hoes awa 6-these óbúntha 6sm-blunt [ mikángo **4**-lions \_ ]. 'The baboons bought the aged lions these blunt hoes.'

However, in (6) the OM is absent, and such extraction is blocked:

(6) Adjective extraction impossible without OM (Mchombo 2006: ex. 4a) \* **Zakuda 10sm-black** atsíkáná 2-girls á 2assoc mfúmu 9-chief a-a-gul-á 2sm-pfv-buy-fv [ mbûzi **10**.goats \_ ]. 'The chief's girls have bought black goats.'

In just the same way, extraction of a demonstrative requires the OM:

(7) No demonstrative extraction without OM (Mchombo 2006: ex. 2b–c) **Awa 2**.prox njuchi 10.bees izi 10prox zi-ná-**\*(wá)**-lûm-á 10sm-pst-**2om**-bite-fv [ alenje **2.**hunters \_ ópúsa 2sm.foolish ]. 'These bees bit the foolish hunters.'

Note that both adjectives and demonstratives in Chichewa originate on the right side of N(P), which sits at the left edge of the DP.

(8) The Chichewa DP: N < Dem < Adj (from Mchombo 2006: ex. 2a) alenje hunter(N) awa these(Dem) ópúsa foolish(Adj)

This pattern is comparable to that of Northern Ostyak: Certain elements in DP can only be extracted when the DP that contains them controls agreement morphology. Importantly, in contrast, certain other elements are extractable whether or not the OM agrees with the containing DP, as the next section shows.

#### **3.2 When extraction and agreement come apart**

So far, we've established that in Chichewa extraction of adjectives and demonstratives requires the OM to agree with the containing DP. As mentioned, in Chichewa the left edge of DP is occupied by N(P). As discussed further in §4, we take this as evidence that NP moves to the left edge of the Chichewa DP. Furthermore, NP is the only element of the Chichewa DP that can always be extracted, whether or not the containing DP controls the OM, as (9) and (10) below show. In (9) the OM is present, while in (10) it is absent, but both of these examples involve acceptable extraction of NP:


This fact is significant: It is unclear, given the theory of Unlocking presented in Van Urk & Richards (2015), why extraction of NP is permitted even in cases where the containing object DP is not Agreed with by the OM. Under such a theory, we expect extraction of NP to require the DP which originally contained it to always be agreed with, and thereby Unlocked – contrary to fact.

However, these Chichewa patterns are expected under a theory of Unlocking like that of Rackowski & Richards (2005) and Branan (2018), in which extraction is permitted by moving via the phase edge, or by Agree with the phase itself. In the previous subsection, we saw that extraction of adjectives and demonstratives, which originate at a non-edge position in DP, requires the OM to Agree with the containing DP. However, as we've just seen, agreement is not required when extracting NP, the element occupying the edge of the Chichewa DP.

We discuss a more specific model of extraction from DP in Chichewa in §5. Before that, in the next subsection, we present some evidence that the OM does indeed involve an Agree relation with the DP that extraction exits.

#### **3.3 The OM involves an Agree dependency**

As Baker (2016) overviews, whether the OM in Bantu languages is a pure agreement marker or a (doubled) pronominal clitic is a subject of debate. For the account proposed in this paper, the presence of the OM must be contingent on some sort of probe-goal Agree relationship, even if the morphological effect of that syntactic relationship is a pronominal clitic rather than a mere agreement marker.<sup>2</sup> We argue that the dependency between the OM and the DP it cross-references indeed shows locality effects characteristic of such probe-goal relations.

While the OM in Chichewa can normally target a direct object, certain other nominals interrupt a potential relation between the OM probe and the direct object. For example, if a benefactive indirect object is present, it must be targeted by the OM rather than the direct object, as shown in (11).<sup>3</sup>

(11) IO blocks agreement with DO (Mchombo 2004: 101, ex. 41a-b) Alenje 2-hunters a-ku-**wá/\*zí**-phík-il-á 2sm-prs-**2om/\*8om**-cook-appl-fv zítúmbûwa 8-pancakes **anyáni** . **2-baboons** 'The hunters are cooking the baboons some pancakes.'

Mchombo (2004) discusses how similar considerations apply to raised possessors, which the OM must target rather than a direct object.<sup>4</sup> As expected, an adjective or demonstrative can only be extracted from the raised possessor, and then, only when it is Agreed with:<sup>5</sup>

(12) No extraction from DO with raised possessor (Mchombo 2004: 55, ex. 22e) \* **Áákûlu** 6sm-big mkángó 3-lion u-ku-wá-dy-él-á 3sm-prs-**2om**-eat-appl-fv [ maûngu 6-pumpkins \_ ] [ amalinyé **2**-sailors ógúnata 2assoc-foolish ]. 'The lion is eating for the foolish sailors (their) big pumpkins.'

<sup>2</sup> See Preminger (2015) for an argument that some sorts of clitic doubling involve an Agree relation.

<sup>3</sup>As a reviewer points out, this effect has independent precedent in Kramer (2014), who argues that the presence of IO blocks agreement (and hence clitic doubling) with DO in Amharic.

<sup>4</sup>Mchombo states that the raised possessor behaves like the object of the clause in other ways, such as accessibility to passivization. This is also expected if the raised possessor is more local than the direct object to higher probes, thus passivization (presumably involving probing by T and subsequent A-movement) targets the raised possessor rather than the lower direct object.

<sup>5</sup>The benefactive arguments and raised possessors that intervene between the OM probe and the direct object nevertheless appear to the right of that object. What is necessary for our account is that these DPs are structurally above the object at the relevant point in the derivation where the OM Agrees with its goal, regardless of their final linear position.

(13) Extraction possible from raised possessor (Mchombo 2004: 56, ex. 22g) **Ógúnata** 2assoc-foolish mkángó 3-lion u-ku-wá-dy-él-á 3sm-prs-**2om**-eat-appl-fv [ maúngú 6-pumpkins áákûlu 6sm-big ] [ amalinyêlo **2**-sailors \_ ]. 'The lion is eating for the foolish sailors (their) big pumpkins.'

Further locality effects emerge from the interaction of extraction with recursive possession. Example (14) shows a recursive possession configuration:

(14) Recursive possession (Mchombo 2004: 60, ex. 29) Anyaní 2-baboons á 2assoc mísala 4-madness a-ku-chí-phwány-a 2sm-prs-7om-smash-fv [3 **chipanda** 7-calabash [2 **chá** 7assoc **kazitápé** 1a-spy [1 **wá** 1assoc **alenje** 2-hunters ]]]. 'The mad baboons are smashing the calabash of the hunters' spy.'

There are restrictions on what can be extracted from recursive possession structures. As we see in (15), it is possible to extract the possessor (DP2) of the direct object (DP3) which controls OM, in this case *chipanda* ('calabash').

(15) The "outermost" possessor may be extracted (Mchombo 2004: 60, ex. 30a) [2 **Chá** 7assoc **kazitápé** 1a.spy [1 **wá** 1assoc **alenje** 2-hunters ]] Anyaní 2-baboons á 2assoc mísala 4-madness a-ku-chí-phwány-a 2sm-prs-**7om**-smash-fv [3 **chipanda 7**-calabash \_ ]. 'The mad baboons are smashing the calabash of the hunters' spy.'

In contrast, as we see in (16), it is not possible to extract DP1, the possessor of the possessor (DP2) of the element that controls the OM (DP3).

(16) The "innermost" possessor may not extract (Mchombo 2004: 61, ex. 32) \* [1 **Chá** 7assoc **alenje**] 2-hunters Anyaní 2-baboons á 2assoc mísala 4-madness a-ku-chí-phwány-a 2sm-prs-**7om**-smash-fv [3 **chipanda 7**-calabash [2 **chá** 7Assoc **chiphadzúwá** 7-beauty queen \_ ]]. 'The mad baboons are smashing the calabash of the hunters' beauty queen.'

We've seen that extraction from the non-edge of the Chichewa DP requires the containing DP to be agreed with by the OM. In (16), notice that the OM has class 7 -features that could have been received via probing of the direct object*calabash*, or the possessor of the direct object, *beauty queen*, both of which are of noun class 7. Thus (16) should be able to represent a structure where the OM agreed with the possessor of the direct object, if it were possible to do so. Such agreement should Unlock the possessor of the direct object, thus permitting extraction of the possessor of the possessor. However, such extraction is impossible.

This fact suggests that in (16) only the direct object can have been agreed with. Thus the direct object is Unlocked and its possessor can be extracted, as in (15). However, this possessor was not agreed with and thus remains locked, so the possessor of the possessor remains un-extractable, as (16) has shown. This is as expected if locality requirements only allow the OM to agree with and hence Unlock the structurally closest DP, the direct object, which the OM probe will necessarily encounter before any DPs which are its sub-constituents.<sup>6</sup>

The restrictions we've seen in this subsection are expected if the OM involves a probe-goal Agree relation, which is constrained by locality considerations like the Minimal Link Condition (Chomsky 1995, 2000) or Relativized Minimality (Rizzi 1990) which force Agree operations to target the closest possible goal.

## **4 Deciding between Unlocking theories**

Recall that we are comparing two theories regarding Unlocking:


In the previous section, we saw that extraction from DP in Chichewa is subject to a particular requirement: Extraction from DP requires that DP to be agreed with by the OM, unless the extracted element is N(P), which normally appears at the left edge of DP. While the *both/and* theory of Unlocking (#2) does not accommodate this absence of agreement in the latter case, the *either/or* version of Unlocking theory (#1) does.

<sup>6</sup>The same asymmetries hold for left branch extraction in Russian (p.c. Tanya Bondarenko). This is as expected if the mechanisms we see evidence for on the surface of Chichewa are in fact more general aspects of syntax, and not mere idiosyncrasies of Chichewa.

Theory #1 leads us to expect the Chichewa facts, given an analysis in which the NP-initial order of the Chichewa DP is derived by movement of NP to the edge of DP. Such movement is proposed for independent reasons in Cinque (2005), and for the Kordofanian language Moro in Jenks (2010), whose DP structure is analogous to that of Chichewa.

(17) NP movement to spec-DP<sup>7</sup> [ **NP** D Dem Adj \_ ]

Because NP occupies the edge of the Chichewa DP, agreement is not required for its extraction. NP can simply be freely extracted even when DP is not Unlocked, as (10) showed. In contrast, we saw that the extraction of elements from the non-edge of DP (adjectives, demonstratives) requires the Unlocking effect of agreement. The next section discusses these derivations involved in detail.

### **5 Predicting the facts**

#### **5.1 Chichewa**

We have seen that agreement with DP is necessary for extraction from the nonedge of the Chichewa DP. As mentioned, we argue that this is because Agree with the DP Unlocks it for further probing. That Unlocking allows an A′ -probe to subsequently search past the edge of DP, as the tree in Figure 1 shows.

If an extracting adjective or demonstrative were able to pass through spec-DP, the edge of this phase, Agreement with DP should not be necessary for extraction. However, this position is occupied by NP in Chichewa, precluding movement to this "escape hatch" position.<sup>8</sup> As a result, the Chichewa DP must be Unlocked through Agree if anything but NP is to be extracted from it.

In contrast, by being at the DP edge, agreement with DP is not a prerequisite for extraction of NP, as we saw in (10). This is what we expect in a theory like that

<sup>7</sup>A reviewer mentions that cardinal numerals might be expected to be pre-nominal, in which case, they might pose an issue for this analysis. Mchombo (2004) in fact shows that cardinal numerals are post-nominal, as usual for adjectives in this language, hence there is no problem here. A reviewer also asks whether there are any adjective ordering constraints in Chichewa, mentioning that the analysis presented here predicts the relative ordering of adjectives to be essentially the same as in a language like English. At the time of writing, we do not have access to information about this, so we leave this question aside for now.

<sup>8</sup> Implicit here is the claim that Chichewa permits only one spec-DP. Alternatively, we could claim that Chichewa D lacks an A′ -probe, or that anti-locality constraints block the relevant elements from moving through the DP edge. Additionally, this ban on exiting a phase which something has already moved into the specifier of is analogous to *wh*-island effects, where an initial *wh*-movement into spec-CP blocks extraction of a second *wh*-phrase.

Figure 1: Deep A′ -extraction fed by -Agreement with DP

of Rackowski & Richards (2005), who argue that while locality considerations require probes to target the closest available goal first, a phase label and its highest specifier (the edge) are equidistant with respect to higher probes. This means that higher A′ -movement probes can freely target either the phase itself, or material that may happen to be present in its edge. Thus in Chichewa, an A′ -probe can extract NP from DP even when DP is not Unlocked. However, no harm is done if the containing DP happens to be agreed with and consequently superfluously Unlocked, as in (9), where both agreement and NP extraction occur.

#### **5.2 Resolving Dinka**

Van Urk & Richards (2015) proposed that extraction from phases requires Unlocking in addition to moving via the phase edge, based on the interaction of extraction out of embedded CPs and EPP effects in Dinka. This result is in conflict with the theory we have argued for based on Chichewa. However, further examination reveals a resolution to this conflict: extraction from CP in Dinka always requires Unlocking to take place, because elements that undergo extraction never reach the edge of the CP phase in this language.

Van Urk & Richards show that Dinka has two positions in the clause which, in the basic case, must be filled. These are claimed to be spec-CP and spec-vP. Spec-vP must be filled by some internal argument, as we see in (18–20).


The subject of the sentence – the element which controls agreement morphology – must occupy spec-CP. We see evidence of this in (21), where unlike the above examples, the subject remains in situ in vP, resulting in ungrammaticality:

(21) \* Spec-CP unfilled in Dinka (Van Urk & Richards 2015: ex. 33d) \*[ \_ a-cíi 3sg-pfv [ Bòl Bol lɛḱ̤ tell Dɛ̀ŋ Deng alkókôl story ] ]. 'Bol told Deng a story.'

Van Urk & Richards further observe that extraction out of an embedded clause appears to satisfy all such EPP positions passed by that movement, which consequently end up unfilled on the surface. We see this in (22), where the EPP positions identified in (18–21) are empty, having been crossed by *wh*-movement:<sup>9</sup>

(22) *wh*-movement satisfies EPP positions passed (Van Urk & Richards 2015: ex. 37) **Yeŋà** Who cíi pfv.ns Yâ̤a̤r Yaar.gen \_ lɛḱ̤ tell Dɛ̀ŋ, Deng [ yé C \_ cíi pfv.ns Bôl Bol.gen \_ tuɔ̀ɔc send \_ wṳ́ṳt cattle.camp.loc ]. 'Who did Yaar tell Deng that Bol sent to the cattle camp?'

<sup>9</sup>A reviewer asks whether the subject DPs in (22) might inhabit spec-vP, given that they do not move into spec-CP in this derivation. Van Urk & Richards are not fully explicit about the position of the subject in these cases, but it appears implicit that subjects occupy a position above spec-vP, presumably spec-TP.

Van Urk & Richards argue that the extracting *wh*-phrase passes through and satisfies the EPP requirement of the spec-vP and spec-CP of the embedded clause. However, they argue that the embedded CP itself satisfies the EPP for the matrix v. They suggest that the embedded CP moves to spec-vP (subsequently extraposing to the right) due to being Agreed with by v in order to Unlock that CP for extraction. If v had to Agree with CP to Unlock it even though *wh*-extraction passed through the CP edge as (22) indicates, it suggests that both movement via the phase edge and Unlocking are required for extraction out of a phase. This finding is contrary to the theory of Unlocking we have argued for here.

To maintain Van Urk & Richards' analysis of the Dinka derivation and keep Dinka consistent with the theory we argue for here, we might hypothesize that elements extracted from an embedded CP in Dinka do not actually pass through the true CP edge. If this is the case, CP will need to be Unlocked before a moving phrase can exit it. There is in fact evidence that there is more structure above the high EPP position in CP that moving phrases pass through: Namely, this position can be preceded by an overt complementizer. We saw this in (22) above, where the gap in the edge of the embedded CP is preceded by the C *yé*. We can independently see this post-C position filled by the subject in non-extraction contexts, as in (23–24):

(23) EPP position in CP preceded by C *ke* (Van Urk & Richards 2015: ex. 4a) A-cá 3sg-pfv.1sg táak, think **ke** C Cà̤n Can bí fut wít wrestling tíaam. win.tr

'I think that Can will win the wrestling.'

(24) EPP position in CP preceded by C *ye* (Van Urk & Richards 2015: ex. 4b) A-cá 3sg-pfv.1sg luéel, say **ye** C Cà̤n Can bí fut wít wrestling tíaam. win.tr 'I said that Can will win the wrestling.'

This is the very position that can be unfilled in extraction contexts, which as Van Urk & Richards argue, is because a phrase being extracted from CP passes through it. But this peripheral position in the Dinka CP is evidently not at the very edge of CP. Therefore Unlocking of CP is still required for extraction.

In sum, Dinka as analyzed by Van Urk & Richards in fact behaves as the account of Unlocking that we have argued for predicts. In Dinka, Unlocking is required for all extraction from CP, because there is no escape hatch at the edge of CP for extraction to pass through. Rather, there is only only an EPP position that is not at the true edge.<sup>10</sup>

<sup>10</sup>A reviewer asks why Van Urk & Richards take a position below the overt complementizer to

## **6 Conclusion**

Chichewa's typically optional object agreement gives suggestive evidence for a particular view of the constraints on cross-phasal extraction – in particular, one in which Agree allows extraction to bypass the edge of a phase, but in which movement to the edge of a phase is also sufficient to escape the phase. A potential contradiction of this theory presented by Dinka proves to be un-problematic: Agree with embedded CPs is required for extraction from them in Dinka because the peripheral position in CP available for phrasal movement is not at the true edge of the embedded clause. These results are consistent with a theory in which Unlocking is only required for "deep" extraction out of phases.

## **Abbreviations**


## **Acknowledgements**

Authors listed alphabetically. Thanks to helpful comments from Norvin Richards, Sabine Iatridou, David Pesetsky, and the audience of ACAL 49 and WCCFL 46. All errors are each other's.

## **References**

Baker, Mark C. 2016. *On the status of object markers in Bantu*. Manuscript. Branan, Kenyon. 2018. Attraction at a distance: Ā-movement and Case. *Linguistic Inquiry* 49. 409–440. DOI: 10.1162/ling\_a\_00278.

be spec-CP. Van Urk & Richards propose an extended left periphery in Dinka with at least two CP layers, in which only the lower CP counts as a phase. For them, the EPP position in the left periphery that we have discussed is therefore the specifier of the relevant phase. In this paper, we posit that it is in fact the higher CP layer that is a phase, and that there is no successivecyclic movement through that upper CP, thus unlocking is required to permit extraction from this domain. We argue that movement through the specifier of the higher CP is banned because movement to this position from the EPP position in the lower CP would be too short, following the formulation of anti-locality in Erlewine (2016, a.o.).

Chomsky, Noam. 1995. *The Minimalist program*. Cambridge, MA: MIT Press.


# **Chapter 9**

# **A syntactic analysis of the co-occurrence of stative and passive in Kiswahili**

Yan Cong & Deo Ngonyani

Michigan State University

This study concerns the co-occurrence of stative and passive in Kiswahili. The cooccurrence is only possible with an intervening applicative suffix and in the order st-appl-pass. There are two readings of the stative extension in Kiswahili, potential and resultative. The study seeks to account for the co-occurrence, the order of the suffixes, and the two interpretations of the stative. Our findings are consistent with the [VoiceP [ApplP [vP [VP]]]] structure. We argue that passive and stative share the same essential structure [Voice, Appl, v]. As to the derivation, we propose syntactic head movement where V moves to the stative head resulting in [V-st], which moves to the applicative yielding [V-st-appl], and finally moves to voice to form [V-st-appl-pass]. Last, but not least, our account connects stative with *patient-manner* predicates to derive resultative reading, and *agent-manner* predicates to derive potential reading.

## **1 Introduction**

This study examines the co-occurrence of stative *-ik-* and passive *-w-* in Kiswahili. The two are part of derivational morphology commonly known in Bantu linguistics as verb extensions (Guthrie 1962). The following sentences illustrate the contrast between active, passive and stative clauses.

(1) a. m-toto 1-child a-li-mwag-a 1sm-pt-spill-fv ma-ziwa. 6-milk 'the child spilled the milk' (active)

Yan Cong & Deo Ngonyani. 2022. A syntactic analysis of the co-occurrence of stative and passive in Kiswahili. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 165–181. Berlin: Language Science Press. DOI: 10.5281/zenodo.6393748


Sentence (1a) is the active sentence with two arguments, the agent *mtoto* 'child' in subject position, and the theme *maziwa* 'milk' in object position. The verb in the passive construction (1b) is marked with the passive suffix *-w-*, and has the theme in subject position while the agent is an oblique object. The stative suffix *-ik-* marks the verb in (1c) where the theme is in the subject position, and there is no agent.

The passive has two allomorphs in Kiswahili, *-ew-* and *-w-*. The stative may be realized as *-ik-*, *-ek-*, or *-k-*<sup>1</sup> . These morphosyntactic changes are summarized in Table 1.


Table 1: Features of passives and statives

As indicated in Table 1, the active verb selects two arguments: agent and theme. Both passives and statives change the verb's valence by reducing the external argument (henceforth EA). They both select the theme. Notice that here theme refers to the prototypical object. The theme becomes the subject of the derived

<sup>1</sup> In this paper we do not include *-ik-* in Chewa that is identified by Simango (2009) as causative disguised as stative as in *gona* 'sleep' and *goneka* 'lay someone down'. This increases the valency. It is also different from Kiswahili impositive according to Schadeberg & Bostoen (2019), as in *choma* 'stab' and *chomeka* 'insert'. This affix in Kiswahili does not change the argument structure.

sentence. In examples (1b) and (1c) above, the subject triggers the subject prefix *ya-* which agrees with *maziwa* 'milk'.

On the surface, passives and statives share similarities in that they "eliminate" EA, promote the logical object to the subject position, and trigger subject agreement. However, there are fundamental differences that call for a closer look. One of the differences is in interpretation. There are two readings of the stative extension *-ik-* in Kiswahili: "potential" denoting activity or state; and "resultative" indicating accomplishment or achievement (1c) (Simango 2009, Levin 1993, Vendler 1967, a.o.). Such readings are not available for passives. Illustrated in (2) is an example showing the co-occurrence.

(2) a-li-mwag-*ik*-i-*w*-a 1sm-pst-spill-st-app-pass-fv maji water 'he got water spilled on him'

Another difference is that the passive does not delete the external argument and allows it to be expressed as an oblique object while the stative appears to eliminate the EA altogether. Thus, it appears the passive selects the theme and the agent, while the stative selects the theme only. Also intriguing is the fact that the passive and the stative can co-occur, as in (2). Since both are derived from transitive verbs, the derivation of both calls for an explanation.

This paper's main claim is that although both passives and statives appear to eliminate EA, this happens in different steps of the derivation. This claim is built upon the assumption that various mergers occur at different steps. In other words, where you get passive merger, stative merger is not expected. The different readings of statives are also derived from different statives. Furthermore, it is only the resultative stative but not the potential stative that can co-occur with passives in Kiswahili. Resultative statives are, in fact disguised causatives. With this fine-grained sub-categorization in mind, a larger amount of data can provide better predictions. By adopting Hale & Keyser (2002), we propose that the stative extension licensing subject promotion is captured as patient-manner predication; while the stative extension blocking subject promotion is analyzed as agent-manner predication.

## **2 Previous studies**

The differences between passive and stative have been a subject of much interest among Bantuists. In this section, we tap into the insights from previous studies on meaning, modularity, argument structure, and the co-occurrence of passive, stative, and applicatives.

#### **2.1 Meanings**

While the passive does not present a range of meanings in Bantu, the stative generates a range of meanings that have led to its being referred to also as neuter or neutro-passive according to Schadeberg & Bostoen (2019: 179). The stative in Kiswahili is associated with two meanings, namely, state and potential (Ashton 1947, Polomé 1967, Schadeberg & Bostoen 2019). These are illustrated by the following examples from Ashton (1947: 227–228).

	- b. Kazi 9.work9 hii this ya-fany-ik-a 9sm-do-st-fv 'This work can be done' potential

(3a) denotes a state of affairs resulting from some event. (3b) refers to the possibility or potential of the event taking place. While the past tense and perfect generally generate stative readings, present tense often leads to potential reading. The present tense often leads to ambiguity due to the availability of both interpretations.

The interpretation of the stative is also sensitive to aspect. Using Vendler's aspectual types of verbs (Vendler 1967), it is possible to discern which verbs receive which interpretation. Vendler classified verbs according to two dimensions, namely, whether or not the verb denoted a process event and whether or not the event had an end point.


These types are presented in Table 2.


Table 2: Aspectual types of verbs (based on Vendler 1967)

Potential readings are associated with verbs denoting events that do not have an endpoint (Dubinsky & Simango 1996). Therefore, verbs expressing activities and state have potential reading in terms of event types when the stative is attached. By contrast, state readings are derived from stativized verbs denoting events with endpoints. Those readings are related to accomplishments and achievements, both of which express a change of state. We shall refer to such reading as "resultative" and use "state" for Vendler's type of verb. The "stative" is the verbal suffix or the construction with such a verb. In the stative construction, the agent of the action does not appear, as examples in (4) demonstrate in Chichewa, a Bantu language related to Kiswahili.

	- a. Shuko Shuko a-dza-thyol-a 1.Agr-fut-break-fv ndodo 9.stick 'Shuko will break a stick'
	- b. Ndodo 9.stick i-dza-thyo-k-a 9.agr-fut-break-st-fv 'A stick will break'

While (4a) asserts that someone will break a stick, (4b) merely asserts that the event will come to pass without implying that an agent will be involved in bringing about the event, which is what the stative suffix *-ik-* means.

Seidl & Dimitriadis (2003) present evidence from aspect and argue that the stative generates a middle construction or impersonal construction.

	- a. Ch-akula 7-food ki-li-kuwa 7sm-pst-be ki-me-pik-ik-a 7sm-prf-cook-st-fv sana very 'The food was being much cooked'
	- b. Ki-tabu 7-book ki-li-kuwa 7sm-ps-be ki-me-zungumz-ik-a 7sm-prf-discuss-st-fv sana very katika in mi-aka 4-year ya 4.of 70 70s 'The book was being discussed much in the seventies.'

The eventive verbs *pika* 'cook' and *zungumza* 'speak' are combined with the stative suffix to create attributive readings. Dubinsky & Simango (1996) examine stative and passive constructions in Chichewa. The Chichewa facts draw parallels with the differences between English adjectival passives and verb passives. The stative is similar to adjectival passives. As in English, the question is whether the two are derived similarly. The interaction with tense and aspect provide some clues regarding its derivational history.

#### **2.2 Argument structure**

One of the insights provided by the previous studies contrasting passive and stative in Bantu is that although the passive and the stative are considered detransitivizing affixes, they differ in a very fundamental way in that while the passive suppresses the logical subject, the stative deletes it (Dubinsky & Simango 1996, Mchombo 1993, Seidl & Dimitriadis 2003). The deletion is observed in several constructions. One such feature is the availability of the agent or logical subject. The passive allows for the agent to be expressed as an oblique object. No such oblique object is available for stative, as illustrated in (6).

	- a. Mbûzi 10.goats zi-na-pínd-á 10sm-pst-bend-a mǎta. 6.bows 'The goats bent the bows.'
	- b. Ma-ǔta 6-bows a-na-pindidw-á 6-sm-pst-bend-pass-fv ndí by mbûzi. 10.goats 'The bows were bent by the goats.'
	- c. Ma-ǔta 6-bows a-na-pínd-ík-a 6-sm-pst-bend-pass-fv (\*ndí (\*by mbûzi). 10.goats) 'The bows got bent (\*by the goats).'

Both passive (6b) and stative (6c) have the theme as the subject. But only the passive allows the *by*-phase. A *by*-phrase would be ungrammatical as (6c) shows.

Mchombo (1993) further demonstrates that passive constructions bear an implicit argument while there is not such implicit argument in stative constructions. This difference can also be shown in Kiswahili using subject-oriented adverbs and purpose clauses.

(7) a. Wa-li-vunj-a 2sm-pst-demolish-fv jengo 5.building makusudi deliberately 'They deliberately demolished the building'


Sentence (7b) with the adverb 'deliberately' implies an agent exists. The adverb relates to the agent. On the contrary (7c) is ungrammatical precisely because there is no agent associated with the event. Often the stative is used in impersonal constructions which have no agent.

In his study of the stative in Chichewa, Mchombo (1993) claims that the stative construction seems to fall in with the class of unaccusatives. Supporting evidence comes from its tolerance of locative inversion, as illustrated in (8).

	- a. Zitseko 8doors zi-a-pind-*ik*-a 8sm-prf-bend-st-fv mu 18in chitsime. 7waterhole 'The doors have got bent in the waterhole (well)'
	- b. M'chitsime 18.7.waterhole mu-a-pind-*ik*-a 18sm-prf-bend-st-fv zitseko. 8:doors 'In the waterhole some doors have got bent'

The two sentences are built on the stative verb *pindika* 'get bent'. The subject in (8a) is *zitseko* 'doors,' which is the theme of the verb. This Class 8 noun has triggered the subject agreement on the verb (8SM). In (8b), on the other hand, the subject is not the theme. The location *m'chitsime* 'in the waterhole' (Class 18) is the subject triggering the subject marker 18SM on the verb. The theme is in the postverbal position.

In a sense, this locative inversion example makes the stative less of an isolated phenomenon. This is because it shows that the stative does behave like an unaccusative. To the extent that the inverted locative functions as the subject, the grammatical subject needs not correspond with the logical subject. Thus, it appears advisable to subsume the stative into the phenomenon of unaccusativity, extending to it whatever formal apparatus the theory of grammar has deployed for dealing with unaccusatives.

However, the unaccusatives are intransitive in their basic underived forms whereas the stative is a derived form. Therefore, while the statives can be accommodated within the unaccusativity phenomena, it is their derivation from

the transitive verbs and their relation to them that demands the appeal to formal devices other than those employed in the analysis of unaccusativity.

Further insights into the passive construction are obtained from Dubinsky & Simango (1996) on Chichewa. The Chichewa facts draw parallels with the differences between English adjectival passives and verb passives. The stative is similar to adjectival passives. As in English, the question is whether the two are derived similarly. Using a modular theory of grammar, they suggest that although the two derivations are thematically similar, they are produced in two formally distinct operations. They examine passive and stative constructions in Chichewa and seek to motivate two distinct types of verbal extension. In their analysis, passive alters mapping from arguments to grammatical functions (GFs) while stative performs an entirely analogous operation on the lexical conceptual structure (LCS) itself. The stative has changed the lexical conceptual structure of the verb. The passive, on the other hand, does not change the lexical conceptual structure of the verb. It is an operation on the mapping of arguments to the grammatical functions.

The important lesson from this study is that although passives and statives look somewhat similar on the surface, they exhibit syntactic properties that point to the operations being carried out in different modules. We believe that this has a bearing on the syntax and the semantics of the stative morpheme *-ik-*.

#### **2.3 The ordering of the suffixes**

The wealth of extensions and combinations raise the questions of how the affixes are ordered and what principles underlie the affixes' ordering. Ngonyani (2016) addresses two questions on Kiswahili verb extensions: (i) what is the order of the extensions in relation to the applicative, and (ii) how can the order be accounted for. The article seeks to establish the positions of extensions relative to the applicative in Kiswahili; and to determine the extent to which the semantic scope can account for the pairwise combinations with applicatives. Using data from the Helsinki Corpus of Kiswahili, the following combinations are discovered.

Table 3 shows that the search for pairwise combinations of extensions with the applicative revealed three distinct patterns. Pattern one shows that the applicative can appear in a variable affix order with the causative and reciprocal extensions. The second pattern shows that the applicative appears after the reversive and does not precede the reversive. The third pattern shows that the applicative appears after the stative suffix and before the passive. The study further supports the Mirror Principle (Baker 1985) and the semantics scope hypothesis (Rice 2000).


Table 3: A summary of the attested and unattested pairwise combinations (Ngonyani 2016: 65)

These three empirical observations provide three arguments supporting the syntactic-semantic account. First, the variable order is attributable to variable scopal relationships. Further, regarding the reversive, it must appear between the root and the applicative. Last, the third argument results from the different positions of the stative and applicative extensions, both of which suppress the agent. Essentially, the passive promotes the applied object. By contrast, the stative promotes the direct object. This corresponds to the passive scope, including the applicative verb, whereas the stative has a narrower scope and falls under the applicative scope.

To sum up, previous studies have shown that there are two readings of the stative, state or result, and potentiality. The meanings are subject to the aspectual type of the verbs and tense. The studies have also shown that while the passive suppresses the external argument and making it an implicit argument, the stative eliminates it altogether. This makes it similar to middle constructions. With respect to the order of the verbal suffixes, the passive appears after the applicative, while the stative appears before the applicative.

### **3 Assumptions**

We follow the proposal by Folli & Harley (2007), Legate (2014), and Pylkkänen (2008) that there are two functional projections within the verb phrase, namely, VoiceP and vP. The three layers are shown in Figure 1.

The internal argument is introduced in the VP where it is assigned its theta role. The vP is the locus of causative semantics. VoiceP is the domain of the external -role where the feature passive or active is specified. In this structure, the theme is generated in the VP, while the external argument is generated in the specifier of the vP.

Figure 1: Three layers of verb phrase

We also assume that theta roles are specified in syntactic positions. We adopt the uniformity of theta assignment hypothesis (UTAH) regarding the positions of arguments (Baker 1988). The hypothesis states:

(9) Identical thematic relationships between items are represented by identical structural relationships between those items at the level of D-structure. (Baker 1988: 46)

According to this hypothesis, the theme in the following two sentences are generated in the same position.

	- b. The ice melted away.

The verb *melt* has as its theme *the ice* in both sentences. However, in (10a), *the ice* is the object of this transitive verb. In (10b), *the ice* is the subject. The subject in (10a) is the cause of the state that is expressed in (10b). The sentence with only the theme as its argument is anticausative. Anticausative constructions express a change of state. They are intransitive and are characterized by the elimination of the agent, promotion of the theme to subject position (Dom et al. 2016, Haspelmath 2016, Heidinger 2015, Kulikov 2011).

The stative in Bantu languages exhibit features of anticausative (Dom et al. 2016, Gluckman & Bowler 2016, Mallya & Visser 2019). The differences between passives and anticausatives also distinguish passives and statives. For this reason,

our discussion will treat the stative as anticausative. Furthermore, we assume that the derivational suffixes for passive, stative, and applicative are syntactic heads that merge in the syntax. Several studies of Bantu verbal derivations (Baker 1985, 1988, Harley 2013, Mallya & Visser 2019, Ngonyani 2016, Pylkkänen 2008, Seidl & Dimitriadis 2003) have made a similar argument about the status of the verb extensions.

## **4 Derivation**

This section provides details of the derivations. In particular, we attempt to address the following questions:


In order to account for (a), we must first establish the positions of the syntactic heads. The features of *v* are consistent with the stative. It selects VP and specifies causation or volition. When the passive is specified on the Voice, it suppresses the external argument. Voice selects a complement that is [+transitive].

Since Voice selects only transitive, it cannot select a constituent that is [−transitive]. The passive and the stative should not co-occur. However, consider an example where such a co-occurrence does occur.

(11) a-li-mwag-*ik*-i-w-a 1sm-pst-spill-st-app-pass-fv maji water 'he got water spilled on him'

In this example (11), the stative and passive co-occur but with an intervening applicative. The goal of the spill appears as the subject of the sentence.

The Kiswahili example shows parallels with the English sentences built on the verb *melt*. While in (10a) the verb takes the theme *ice* as its object and *the heat* as the causer, in (10b) the verb takes only the theme as its argument. The theme appears in the subject position in (10b). Previous studies of applicatives have establiished that the applied object is generated in a position higher than the theme or patient based on c-command relations and ellipsis (Marantz 1993, Ngonyani 1996, Pylkkänen 2008). The position of the applicative in relation to the passive and the stative is shown in Figure 2.

Figure 2: The position of the applicative in relation to the passive and the stative

The derivation begins with the merger of the verb *mwag* 'spill' with the object *maji* 'water'. Next, this VP merges with the v stative *-ik*, moving the *V* to attach to the left of the stative head forming *mwag-ik*. This new stative head merges with the Appl *-i* to create *mwag-ik-i*. This applicative has introduced the applied object *3sg* in the specifier of ApplP. This *mwag-ik-i* complex moves to the Voice to pick the [−active] feature, attaching on the left of the passive head *-w* to create *mwagik-i-w*. The applied object then moves to SpecTP to satisfy the EPP requirement. The derivations is shown in Figure 3.

This structure does not show the details of tense *li-* and mood *-a* in order not to crowd the presentation of the derivations.

The derivation is consistent with two features highlighted in the beginning, namely, (a) the passive and stative both thought to act on the external argument may appear on a verb; and (b) the order of the verbal suffixes.

Figure 3: Derivation details

## **5 Implications**

The interpretation of the stative indicates some interesting parallels with the argument structure of transitivity alternation (Hale & Keyser 2002). In their theory, Hale and Keyser assert that structures are characterized by two kinds of relations: head-complement relations and specifier-head relations. These relations are projected from the lexical entry of each head. Such lexically determined relations are responsible for the difference between a verb that takes a complement prepositional phrase, as in (12), and adjunct PP in (13).

	- a. They smeared *mud* on the wall.
	- b. \* *Mud* smeared on the wall.
	- a. The puppy spilled *water* on the floor.
	- b. *Water* spilled on the floor.

Hale & Keyser (2002) characterized *smear* as agent-manner verb taking a complement PP with the reading 'smear X on Y.' The verb includes information regarding its adverbial feature describing what the external argument does. On the other hand, the verb *spill* is a patient-manner verb with semantic features expressing motion, distribution, dispersal, or attitude of the patient. This is the alternating type because its features are associated with the internal argument.

Swahili does not permit such alternation. However, the stative derivation yields patterns of readings reflecting the split between agent-manner verbs and patient-manner verbs. We use the verb *ziliba* 'smear' and *mwaga* 'spill'.


	- b. Ma-ji 6-water ya-li-mwag-ik-a 6sm-pst-spill-st-fv sakafu-ni. 9.floor-loc 'Water spilled on the floor.'
	- c. Ma-ji 6-water ya-li-mwag-ik-i-a 6sm-pst-spill-st-appl-fv sakafu-ni. 9.floor-loc 'Water spilled onto the floor.'

The stative in (14b) creates a potentiality reading for the agent-manner verb, while (15b) has a resultative reading for the patient-manner verb. The addition of the applicative does result in ungrammatical form for the agent-manner verb (14c), and a grammatical form for the patient-manner verb (15c). The applicative constructions show clearly that subject promotion is possible with patientmanner predication and not possible with agent-manner predication.

In the light of Hale and Keyser's proposal, it is clear at this point that the argument structure of the stative derivation in Swahili calls for further investigation. The study of the interaction of these two verb types and Vendler's aspectual types is likely to lead to a better understanding of the stative readings.

### **6 Conclusions**

This paper set out to examine the co-occurrence of stative *-ik* and passive *-w* in Kiswahili. Both reduce the valency of the verb by either suppressing the external argument or eliminating it altogether. They co-occur when there is an intervening applicative affix, and in the st-appl-pass order. We offer an analysis of the non-canonical argument realization and explore what this shows and explore how we can conceptualize the alternation from a cross-linguistic perspective.

This paper argues that voice projects on top of v, and v is the head that is interacting with the external argument. This is in line with Collins (2005), and Beck & Johnson (2004) maintaining that Voice is independent of agent-hood. Voice selects for a particular vP carrying certain properties. But this does not mean actives select for vP with an external argument. Voice distinction is independent of the presence/absence of an EA. This explains the empirical observation that unaccusatives is still active in a voice sense despite of the lack of an EA, given that Voice is not the head that is responsible for any agenthood arguments.

A prediction of the current analysis is that there are two manners to derive a middle. In a regular middle where only one argument gets realized, first get the agent argument removed and then derive the middle built upon a subjectless vP. In the middle varieties where both of the arguments get realized, Appl introduces the recipient goal argument inside a low-applicative structure and then the middle is built upon a vP that gets two arguments realized via ApplP.

## **Abbreviations**


## **Acknowledgments**

The language consultant is Professor Deogratias Ngonyani. Starting from (1), all Kiswahili data points (except for those that are explicitly marked by citations) were collected in class "FS17 LIN881 The structure of Kiswahili". Asante Mwalimu Deo Ngonyani! We are extremely grateful to ACAL 49 for their comments and engagement in our Q&A session. This project has also benefited from early discussions with LIN881 classmates.

The first author would like to express deep gratitude to Professor Deogratias Ngonyani. Xiayimaierdan Abudushalam, Yuankai Chen, Adam Smolinski, and Rachel Stacey also deserve thanks, for their invaluable comments at various points of the writing of this paper. All errors are ours.

### **References**


# **Chapter 10**

# **Propositional attitude verbs and complementizers in Medumba**

Terrance Gatchalian<sup>a</sup> , Rachel Lee<sup>a</sup> & Carolin Tyrchan<sup>b</sup> <sup>a</sup>University of British Columbia <sup>b</sup>University of Potsdam

We present the preliminary results of an investigation on complementizers and their interaction with propositional attitude verbs in Medumba (Grassfields Bantu, Niger-Congo). This initial sketch of the Medumba C-system opens up questions about the syntactic distribution and semantic force of the various Cs. There are two clause-initial Cs, /mbʉ/ and /ndà/, of which /mbʉ/ has three syntactically conditioned allomorphs: [mbʉ̀]-L, [mbʉ́ʉ̀]-HL and [mbʉ̀ʉ́]-LH. The clause-final [lá] obligatorily co-occurs with two of the clause-initial Cs, namely [mbʉ́ʉ̀]-HL and [ndà]. Additionally, the inventory of propositional attitude verbs (pavs) is quite small, with only four identified thus far: two are monomorphemic ([lεn] 'know', [t͡ʃúp] 'say') and two are are bi-morphemic ([kwὲ-də̀] 'think-iter', [bέt-tə́] 'askiter'). We make the case for a syntactically conditioned floating H-tone. Additionally, we propose a basic structure of the Medumba CP and raise questions about the scope of polarity and the nature of the clause-final particle /lá/.

**Keywords:** Medumba, complementizer, propositional attitude, embedding

## **1 Introduction**

Medumba, a Bamileke language of western Cameroon, exhibits a wide use of grammatical tone. While the patterns of Medumba tone have been described in detail by Voorhoeve (1971), the grammatical functions which are expressed through tonal morphemes have not received the same level of detailed description or analysis, and have often been left out of the discussion of tone in Medumba. In fact, the status of tone as a syntactically conditioned element is not discussed in the current literature on Medumba. Instead, tone in Medumba has

Terrance Gatchalian, Rachel Lee & Carolin Tyrchan. 2022. Propositional attitude verbs and complementizers in Medumba. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 183–194. Berlin: Language Science Press. DOI: 10.5281/zenodo.6393750

been treated as a strictly phonological phenomenon. This, as we will outline below, obscures some important relationships within the grammar of Medumba (see also Keupdjio 2020).

This paper has two principle aims. Firstly, it describes both the system of Propositional Attitude Verbs (pavs) and the system of complementizers, which have received little descriptive treatment. Secondly, it presents an argument for tonal morphology specifically within the complementizer system, further suggesting the possibility for syntactic activity of tone elsewhere in the language.

## **2 Propositional attitude verbs**

We will be defining propositional attitude verbs (pavs) as verbs conveying mental attitude or communicative verbs, as outlined in Pearson (2021) (see also Asher 1987). The inventory of pavs in Medumba is relatively small, consisting of two mono-morphemic forms (1a, 1b) and two bi-morphemic forms (1c, 1d).


The unsuffixed, mono-morphemic forms of the bi-morphemic pavs (/kʷὲ-də̀/ and /bέt-tə́/) are unattested. The iterative morpheme is underlyingly toneless, and takes the final tone of the base to which it is affixed. The bi-morphemic pavs appear to be lexicalized iterative forms, as can be seen by productive use of the iterative suffix, such as in (2).

(2) a. [nʉ̀ t͡ʃúb-ə́] 'to say' b. [nʉ̀ t͡ʃúp-tə́] 'to talk'

These four propositional attitude verbs participate in a variety of constructions, taking either a nominal complement, as in (3a) or a clausal complement as in (3b).

(3) a. wàtὲέt Watat lὲέn know nùŋgὲ Nuga 'Watat knows Nuga.' b. mʉ̀ 1sg lέn know mbʉ̀ comp nzì envy kʰúʔú taro t͡ʃʷὲέt pres nd͡ʒέ hurt 'I know that Numi is hungry.'

When the complement of the pav is a noun, the bare noun appears. That is, the thematic role assigned to the DPs is not marked by any overt morphosyntactic element. Instead, the semantic interpretation follows from the rigid order of constituents, S V Odirect Oindirect. As exhibited in examples (4a, b), the theme constituent must precede the goal constituent.

	- a. wàtὲέt Watat t͡ʃúp say nʉ́ nʉ́ truth nùŋgὲ Nuga 'Watat said the truth to Nuga.'
	- b. # wàtὲέt Watat t͡ʃúp say nùŋgὲ Nuga nʉ́ nʉ́ truth Intended: 'Watat said the truth to Nuga.'

When the pav takes a clausal complement, the left-edge of the clause is delineated by one of four complementizers. The properties of these complementizers and their analysis form the basis for the remaining sections of this paper.

Table 1 roughly outlines the subcategorization properties of the pavs. All pavs except /kʷὲ-də̀/ can take a DP as a complement. A more detailed investigation of the subcategorization of pavs is left as an avenue for further research.


Table 1: Subcategorization properties of pavs (*tentative*)

We will also include the modal verb [bʰʷɔ̀ɔ́] 'be.good' in the following discussion. It should be noted that this verb occurs only with the impersonal subject, as can be seen in (5a). While it is not considered a pav, it is included here due to its participation in subordinating constructions and its interaction with the complementizers.

	- b. \* nùmí Numi bʰʷɔ̀ɔ́ be.good t͡ʃúp say nʉ́ nʉ́nə́ truth Intended: 'Numi should say the truth.'

## **3 Complementizers and clausal complements**

#### **3.1 Introduction**

There are four complementizer forms in Medumba, each varying in the semantic contribution it gives to the subordinate clause. It will be argued that their co-occurrence restrictions arise principally from incompatibility between the semantics of the complementizer and the semantics of the subsequent selecting verb. We will also argue for a compositional account regarding the semantics of the complementizers responsible for this incompatibility with certain pavs.

These four complementizers can be divided into two different underlying forms based on their segmental content; one being referred to as the /ndà/-form complementizer, and the other as the /mbʉ/-form complementizer, which has three tonal surface forms.

#### **3.2 /ndà/-form complementizers**

To begin with the simplest case, the clause-initial /ndà/ complementizer, in (6), is uniformly low-tone (L). Furthermore, as in (6a), it obligatorily co-occurs with the clause-final /lá/ particle.

	- b. \* á 3.sg bʰʷɔ̀ɔ́ be.good ndà comp nùmí Numi ʒʉ́ʉ̀ eat ʒú thing Intended: 'It is good that Numi ate something.'

The syntactic and semantic properties of this complementizer are subject to further investigation and will not be discussed in great detail in this paper. However, for the purposes of our discussion, it is relevant that this complementizer does not appear to have surface tonal allomorphy as does the complementizer that will be discussed immediately below.

#### **3.3 /mbʉ/-form complementizers**

#### **3.3.1 L-tone complementizer**

There are three complementizers which all carry the segmental content /mbʉ/ but have distinct tonal melodies: [mbʉ̀]-L, [mbʉ́ʉ̀]-HL, and [mbʉ̀ʉ́]-LH. These complementizer forms each have a unique distribution under the four pavs and [bʰʷɔ̀ɔ́]. The possible co-occurrence patterns are given in Table 2 below. All logically possible combinations of a pav and a C with or without [lá] that are not depicted here were tested as well, but were judged as infelicitous by our consultant.

Table 2: Co-occurence of complementizers with pavs


The L-tone allomorph [mbʉ̀] can introduce embedded indirect speech under three pavs, as seen in (7a–c). The verb [bέt-tə́] is incompatible with [mbʉ̀], as in (7d).

	- b. mʉ̀ 1.sg t͡ʃúp say nùmí Numi mbʉ̀ comp t͡ʃə̀ə́ŋ food ʙə́ be.cooked 'I say to Numi that the food is ready.'
	- c. mʉ̀ 1.sg kʷὲdə̀ think mbʉ̀ comp nzì envy kʰúʔú taro t͡ʃʷὲέt pres nd͡ʒέ hurt nùmí Numi 'I think that Numi is hungry.'
	- d. \* mʉ̀ 1.sg bέttə́ ask nùmí Numi mbʉ̀ comp t͡ʃə̀ə́ŋ food ʙə́ be.cooked Intended: 'I ask Numi if the food is cooked.'

e. # á 3.sg bʰʷɔ̀ be.good mbʉ̀ comp nùmí Numi ʒʉ́ʉ̀ eat ʒú something Intended: 'It is good that Numi eats something.'

As can be seen in example (7e), the combination of [bʰʷɔ̀] and [mbʉ̀] was also judged as infelicitous by our consultant.<sup>1</sup> As [mbʉ̀] introduces embedded speech and appears to simply introduce an embedded clause, it could be that [bʰʷɔ̀] is simply incompatible with the semantics of the embedded clauses tested. This observation leads to the assumption that the incompatibilities of certain pavs and Cs actually only arises on a semantic level, thus suggesting that the composed meanings that arise from their combination have to be accounted for at LF. As this paper aims at describing and explaining the phenomenon on a morphosyntactic level, this semantic interaction after narrow syntax is left to further investigation.

#### **3.3.2 LH-tone complementizer**

The LH-allomorph [mbʉ̀ʉ́] seems to be well-formed only in contexts that are compatible with a modal context, exhibited in (8a–e).


<sup>1</sup>Note here that the tone on the modal verb differs from that given in (6). A discussion of this follows at the end of §3.3.2.

The segmental similarity between the LH-allomorph and the L-allomorph, despite their semantic and distributional heterogeneity, raises the question of whether these forms are related to each other. By assuming a positive answer to this question, we will argue that we receive an insightful understanding of how tone can be syntactically conditioned in the complementizer system, and by consequence, opens up the possibility of understanding tone elsewhere in the grammar as an integral part of syntax-proper.

The core of the present discussion rests on the assumption that the complementizer can be meaningfully decomposed into an underlying /mbʉ̀/-form of the complementizer and a syntactic H-tone. There are both empirical and theoretical consequences for such an analytical assumption, so it is worth decomposing the assumption itself.

Consider the [mbʉ̀ʉ́]-LH complementizer, which, as was illustrated above, is interpreted as having deontic meaning. A decompositional analysis of this complementizer takes this form as the combination of the low-toned /mbʉ̀/-form, which introduces the embedded clause, with a syntactic H-tone, which provides the deontic force whose effects are seen from its distributional restrictions under the pavs.

The first possible objection to this analysis is the empirical evidence for the decomposition. To answer this, we can consider cases of deontic force without the presence of a complementizer. Consider the data in (9) below, where (9a) shows an embedded clause introduced by the LH-complementizer as expected. In (9b), however, the complementizer is not present; instead, the verb surfaces with an additional obligatory H-tone.

	- b. á 3.sg bʰʷɔ̀ɔ́ be.good nùmí Numi t͡ʃúp say nʉ́ nʉ́nə́ truth 'Numi should say the truth.'

When the complementizer is not pronounced as in (9b) (whether it is a null element in C or syntactically absent), we see that the H-tone is still present, this time surfacing on the verb [bʰʷɔ̀ɔ́], which appears as L-toned in (9a). This demonstrates the presence of a floating H-tone, which cannot be taken as an inherent part of a simplex [mbʉ̀ʉ́]-LH. Rather, the persistence of the H-tone in this construction is a direct result of its morphosyntactic independence from the rest of the complementizer.

#### **3.3.3 HL-tone complementizer**

The HL allomorph [mbʉ́ʉ̀] introduces embedded polar statements, and requires the presence of the clause-final [lá], (10a–e).

	- b. mʉ̀ 1.sg t͡ʃúp say nùmí Numi mbʉ́ʉ̀ comp t͡ʃə̀ə́ŋ food ʙə́ be.cooked lá la 'I wonder (to Numi) if the food is ready.'
	- c. mʉ̀ 1.sg kʷὲdə̀ think mbʉ̀ comp nzì envy kʰúʔú taro t͡ʃʷὲέt pres nd͡ʒέ hurt nùmí Numi lá la 'I think (about) if Numi is hungry.' (Whether Numi is hungry or not by now, it's something that I thought about.')
	- d. mʉ̀ 1.sg bέttə́ ask nùmí Numi mbʉ̀ comp t͡ʃə̀ə́ŋ food ʙə́ be.cooked lá la 'I ask Numi if the food is ready.'

The final [lá] element is obligatory with the clause-initial HL-complementizer. This appears to parallel two other instances: the floating H-tone present in the LH-complementizer, and the /ndà/-form complementizer /lá/. Given that there are two elements in all these cases, that these elements appear to delineate the embedded clause, and further that they interact directly in the case of the floating H tone, we will assume that these elements are all local to each other. That is, syntactically, they are all articulations of the CP-domain. Such an approach might invoke the work done in the Cartographic approach, which assumes multiple functional projections within the CP-domain, each with a dedicated function (Rizzi 1997). While our proposal is not immediately incompatible with the details of this approach, the Cartographic CP is decomposible into multiple functional projections, which are mappings between syntactic position and function. We will put aside the question of how this analysis might translate into a Cartographic approach for further research.

This assumption of having two CP-elements is further motivated by the presence of these two elements in other semantically distinct contexts, such as relative clauses (Kouankem 2011) and in other syntactically distinct contexts, such as the DP-domain and its articulation (Kouankem 2011, 2012).

#### **3.4 Clause-final /lá/**

As described earlier, the clause-final element /lá/ is obligatory at the end of CPs introduced by [mbʉ́ʉ̀]-HL and [ndà]-L. It uniformly carries H-tone. The /lá/ particle also appears in several other environments and is not exclusive to complementizer contexts. When /lá/ occurs, it is always obligatory. That is, the cooccurrence with the /mbʉ/-form complementizers is strict: When the particle is grammatical, it must be present and in constructions where it is not present, it is ungrammatical. The nature and function of the /lá/ particle need to be further investigated.

In questions that contain the [mbʉ́ʉ̀]-HL C, /lá/ is in complementary distribution with the Q-particle /kí/. This supports our previous assumption that the H-tone /lá/ and /kí/ contribute to the syntax of the CP. As the two particles are in complementary distribution with the floating H-tone as we showed earlier, we assume that the H-tone carried by /lá/ and /kí/ is inherent to them. As further evidence, there are no instances of a L-tone /lá/ or /kí/ in our research.

As mentioned in §3.3.3, [mbʉ́ʉ̀]-HL can be understood as introducing embedded polar statements. Consider the data in (11a–d), where (11a) contains a statement while (11b) outlines a direct question. As can be seen in the cited examples, the clause-final particle /lá/ and the polar question marker /kí/ are in complementary distribution.

	- b. ú 1.sg lὲn know mbʉ́ʉ̀ comp á 3.sg lὲgdə̀ə́ forget bʰúʔŋwànì packet.school kí q 'Would you know if he forgot the book?' (lit. 'Would you know (or not) if he forgot the book (or not)?')
	- c. \* ú 1.sg lὲn know mbʉ́ʉ̀ comp á 3.sg lὲgdə̀ə́ forget bʰúʔŋwànì packet.school lá la kí q Intended: 'Would you know (or not) if he forgot the book (or not)?'
	- d. \* ú 1.sg lὲn know mbʉ́ʉ̀ comp á 3.sg lὲgdə̀ə́ forget bʰúʔŋwànì packet.school kí q lá la Intended: 'Would you know (or not) if he forgot the book (or not)?'

While /lá/ appears to only interact with the C of the embedded clause, the presence of /kí/ seems to also affect the matrix clause. Although polarity is indicated through the embedded clause, it may also be found within the matrix clause. The question particle induces a polarity reading in the matrix clause. Lacking the question particle, the embedded clause is still polar, if it contains [mbʉ́ʉ̀]-HL. This /kí/-independent polarity is likely linked to the HL-form complementizer, as /lá/ is not limited to these contexts. Due to our lack of data concerning the scoping behaviour of /kí/, this topic should be investigated further with new data. Additionally, it might also be fruitful to further investigate if [mbʉ́ʉ̀]-HL is composed of an underlying [mbʉ̀]-L plus a floating tone like its deontic counterpart. In this scenario, which we will follow further in section 4, the floating H tone would realize polarity. As a result, it would be necessary to investigate how the polarity on the complementizer and the question particle interact, if this could explain the unusual scope of polarity in (11) and what implications it would have to assume multiple, phonologically similar, but syntactically and semantically distinct floating tones for modality and polarity.

## **4 Deriving the CP**

Taking into account the above discussion, this section aims to provide a sketch for the derivation of the Medumba CP. The presence of (at least) two projections will be taken for granted on the basis of the discussion in the previous sections. Additionally, the distributional clues of the elements within those projections provides us with their potential syntactic positions.

First, consider the floating H-tone of the LH-complementizer. This, assuming that it linearly *follows* the underlying L-complementizer and that linear order is a heuristic for syntactic position, gives us the following syntactic structure in (12). Furthermore, we will assume that Medumba is uniformly head-initial and that all linearization that deviates from this is a result of movement (Kayne 1994).

(12) [CP mbʉ̀ [CP H TP ]]

The above derivation shows two available positions within the CP.<sup>2</sup> Naturally then, the /lá/ particle, which delineates the right edge of embedded clauses, might head this lower CP projection. This is further suggested by the fact that it is in complementary distribution with the deontic H-tone of the LH-complementizer,

<sup>2</sup>We would like to thank an anonymous reviewer for pointing out a bracketing issue that arises here: As the deontic H-tone is below the [mbʉ̀] complementizer, it should not be available for selection by a pav. This remains a problem, which the phonological process of tone docking itself is unable to solve. One possible stipulative solution is to treat the H-tone as an affix, moving it and adjoining its formal features to the upper CP.

as discussed previously. Under the model in (12) we have, as desired, a local relationship between the deontic H-tone and the complementizer [mbʉ̀]. In the case of /lá/ heading the lower CP, this requires movement of the TP rightward past /lá/ as /lá/ is a clause-final element. The exact mechanism responsible is a question for future research.

This raises questions about the status of the polarity H-tone of the HL complementizer. Since this co-occurs with the /lá/ complementizer and its surface realization suggests that it precedes the underlying L-complementizer, we might suggest an additional position above the structure, as in (13).

(13) [XP (H) [CP mbʉ̀ [CP lá TP ]]]

Given its semantic force, we assume that this upper position occupied by the polarity-inducing H-tone might be an optional PolP head. However, such detailed questions about the CP-structure are subject to further investigation.

### **5 Concluding remarks and remaining questions**

While this paper is far from presenting a complete analysis of the complementizer system in Medumba, we hope this discussion has provided a motivation for the presence of syntactically conditioned tone by focusing on the CP domain. By analyzing the impact of pavs and how they interact with complementizers, we realize that there is much to be gained from looking at some tones in Medumba as properly syntactic rather than pushing off all tonal alternations to the phonology, which can possibly be extended to other domains of the syntax. Furthermore, the interaction of complementizers, grammatical tone and a small inventory of pavs proves to be a successful strategy to bridge the gap to the semantic possibilities that are lexical in languages with a richer pav inventory.

Major questions that have arisen during our research include understanding the presence of /lá/ and how the scope of polarity works in Medumba. /lá/ holds an important clause-final position, but at this time we cannot conclusively say whether or not /lá/ holds a definitive role in relation with any of the complementizers. Despite tone acting as a main divisor between meanings and allomorphs, we cannot definitively state whether or not it holds any power as to what determines the selection of what gets read as a propositional attitude reading. Additionally, as the scopal behaviour of polarity is still diffuse and we cannot define a proper position for the PolP so far, we suggest further investigation on this topic.

## **Abbreviations**

iter iterative recp reciprocal

## **Acknowledgements**

We would like to thank Hermann Keupdjio for sharing his language with us, Rose-Marie Déchaine for her guidance, the UBC LING 431/432/531/532 class of 2017/2018 for their comments and discussions, and the attendees of ACAL 49 for their comments. All remaining errors of data and interpretation are the authors'.

## **References**


# **Chapter 11**

# **Overt subjects and agreement in Zulu infinitives**

Claire Halpert

University of Minnesota

This paper explores a surprising interaction of agreement and concord inside infinitive clauses in Zulu. In Zulu, as in many Bantu languages, infinitive verbs are marked with noun class 15/17 morphology. Internal arguments of infinitives are typically unmarked, while the external argument must receive so-called associative morphology and must precede internal arguments. I argue that the external argument in these constructions is realized in Spec,vP, a finding that has a number of consequences for our understanding of clause structure and agreement in Zulu and related languages.

## **1 Introduction**

This paper investigates infinitive clauses in the Bantu language Zulu that have overt agents. As illustrated below in (1), agents of Zulu infinitives must precede internal arguments (1a) and cannot follow them (1b).<sup>1</sup>

(1) a. [U-ku-nikeza aug-15-give kwa-khe 15.assoc-1pro izingane aug.10child amavuvuzela] aug.6vuvuzela ku-ya-ngi-casula. sm15-dj-1sg.om-annoy 'His giving the children vuvuzelas annoys me.'

<sup>1</sup>All examples in this paper are from Zulu, unless otherwise noted. Unsourced Zulu examples are taken from my own fieldwork.

Claire Halpert. 2022. Overt subjects and agreement in Zulu infinitives. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 195–209. Berlin: Language Science Press. DOI: 10.5281 / zenodo.6393752

b. \* [U-ku-nikeza aug-15-give izingane aug.10child amavuvuzela aug.6vuvuzela kwa-khe] 15.assoc-1pro ku-ya-ngi-casula. sm15-dj-1sg.om-annoy 'His giving the children vuvuzelas annoys me.'

These constructions exhibit a puzzling constellation of properties. As the examples above illustrate, overt agents in infinitives must be marked with so-called associative morphology, which typically mark adnominal adjuncts (e.g. Sabelo 1990, Halpert 2015, Pietraszko 2019). At the same time, they require VSO word order, placing the associative-marked subject in a position that is otherwise unusual for adjuncts in the language but typical for in situ subjects. I will argue, using evidence from binding, that the overt subject in these constructions is truly in an argument position, in Spec,vP, despite the appearance of associative morphology. This conclusion raises an additional puzzle: as we will see in §3.2, these overt subjects do not block object agreement from appearing inside the infinitive, unlike vP-internal subjects of finite clauses in the language.

How can we reconcile this mix of properties? I will suggest two instructive parallels: Linker Phrases in Kinande (Baker & Collins 2006, Schneider-Zioga 2015b,a) and external arguments of passives in Zulu. If we treat the associative morphology that appears on subjects in infinitives as a head in the clausal spine, akin to the Kinande Linker, then the patterns found in these infinitival constructions with respect to agreement are analogous to those found in Zulu passives, as I will discuss in §4.

## **2 Background: A subject syntax baseline**

Zulu is a Bantu language (S42) spoken primarily in South Africa. In this section, I will lay out some of the basic properties of Zulu that will allow us to understand the puzzles posed by the subjects of infinitives. In particular, we will need to establish the expected patterns of agreement and word order, the basic properties of infinitives, and the basic properties of so-called associative constructions.

#### **2.1 Agreement and word order**

Zulu nouns are divided into 14 noun classes that are notated by number. Agreement and concord processes are glossed using noun class numbers – a number that matches the number on a noun agrees with the noun. Like most Bantu languages, Zulu has obligatory subject agreement morphology and optional object

agreement morphology on verbs. In Zulu, predicates agree with vP-external arguments only: subject agreement tracks the highest vP-external (or pro-dropped) argument, while object agreement appears when a lower argument is vP-external or pro-dropped. In situations when there is no vP-external argument, an expletive agreement *ku-* (class 15/17) appears in the subject agreement spot. The verb in Zulu undergoes head movement to a vP-external position, so any preverbal arguments are outside of vP (Buell 2005, Halpert 2015).

In (2) below, we can see subject agreement tracking a pre-verbal/pro-dropped subject. The postverbal object is inside vP, so no object agreement appears.<sup>2</sup>

	- b. (*Omakhelwane*) aug.2neighbor *ba*sm1 xova make ujeqe. aug.1steamed.bread 'The neighbors are making steamed bread.'

When the subject remains inside vP, we get default agreement: class 17 *ku-*. 3 In the examples in (3) below, the post-verbal subject is followed by a low adverb, *kahle*, 'well,' which must appear inside vP (Buell 2005).

	- b. *Ku*sm17 pheka cook uZinhle aug.1Zinhle kahle. well 'Zinhle cooks well.'

When objects remain in situ, no object agreement appears, as we saw in (2). When an object appears outside of vP, it controls object agreement:<sup>4</sup>

(4) UZinhle aug.1Zinhle u-ya-*m*-xova sm1-dj-om1-make kahle well *ujeqe*. aug.1steamed.bread 'Zinhle makes steamed bread well.'

<sup>2</sup>We can determine the position of postverbal material using the distribution of the present tense *disjoint* morpheme *-ya-*, which appears on the verb just in case vP is empty (Buell 2005, Halpert 2015, 2017). In the examples in (2), there is no disjoint morpheme (the verb appears in its bare *conjoint* form), so the postverbal object must be inside vP.

<sup>3</sup>As Buell & de Dreu (2013) note, in modern Zulu, classes 15 and 17 have become indistinguishable. For clarity here, I follow the convention of marking default agreement as class 17, but infinitives as class 15.

<sup>4</sup>Object agreement is typically *required* for vP-external objects, with limited exceptions in the case of double dislocation constructions e.g. Adams 2010, Zeller 2012. The placement of the low adverb *kahle* and the appearance of the disjoint (*-ya-*) here indicate that the object is outside of vP (Buell 2005, Halpert 2015).

#### Claire Halpert

We saw in (3b) that subjects can remain inside the vP and cannot control subject agreement from this position. When the subject remains low in a finite clause, all lower arguments must also remain vP-internal.<sup>5</sup> As expected, these trapped internal arguments cannot be pro-dropped to control either subject or object agreement:

	- b. \* A-phek-e sm6-cook-pst uSipho aug.1Sipho (amaqanda). aug.6egg intended: 'sipho cooked them.'
	- c. \* Kw-a-phek-e sm17-om6-cook-pst uSipho aug.1Sipho (amaqanda). aug.6egg intended: 'sipho cooked them.'
	- b. \* Kw-a-zi-nikeza sm17-pst-om10-give uMfundo aug.1Mfundo amavuvuzela aug.6vuvuzela (izingane). aug.10child intended: 'mfundo gave them vuvuzelas.'
	- c. \* Kw-a-wa-nikeza sm17-pst-om6-give uMfundo aug.1Mfundo izingane aug.10child (amavuvuzela). aug.6vuvuzela intended: 'mfundo gave them to the children.'

Word order in these transitive expletive constructions is completely rigid: V S (IO) DO, which I have argued reflects the base positions of the arguments (Halpert 2015). To summarize the basic picture of agreement and word order in finite clauses, we have seen in this section that agreement in Zulu corresponds with movement out of vP and that low subjects block other arguments inside vP from moving or agreeing.<sup>6</sup>

<sup>5</sup>There are a few limited cases in Zulu where a locative or instrumental argument can control subject agreement while the external argument remains in vP (Buell 2007, Zeller 2013). Zeller (2013) argues that these cases involve introduction of the instrument or locative in a position structurally higher than vP, which would make them non-exceptions to this generalization.

<sup>6</sup>Zeller (2015) argues that in Zulu, T – the host of subject agreement – must probe before other heads in the same phase, including the host of object agreement. If the non-agreeing subject is a defective intervener, it would necessarily block both subject and object agreement on this view.

#### **2.2 Infinitives**

Infinitives in Bantu languages often look like verbs that bear noun class morphology (Schadeberg 2003). In Zulu, verbs that have the typical distribution of infinitives are marked with noun class 15(/17) *uku-*:

	- b. Ngi-yethemba 1sg-hope [uku-ni-bona]. aug.15-2pl.om-see 'I hope to see you all.'

The *uku-* prefix can attach above a variety of verbal inflectional morphology, including object agreement, negation, mood, and aspect, as (8) below illustrates. The basic generalization is that *uku-* can combine with morphology that would follow subject agreement in a finite clause.

	- b. uku-sa-m-thanda aug.15-dur-om1-love kabi badly uSipho aug.1Sipho 'to still really love Sipho'

As (7) and (8) show, Zulu infinitives can involve quite a bit of clausal structure above the verb root and seem to preserve the internal argument structure of the verb. As the *uku-* infinitive morphology suggests, from the outside, infinitives look just like nominals: as (9) illustrates, they can control subject and object agreement under the same circumstances that nominal arguments do:


The main takeaways about Zulu infinitives, then, are that they have an internal structure (below the position of subject agreement) that looks similar to finite clauses but an external structure that looks similar to nominals.

#### **2.3 Associative: Adnominal modification**

The final piece that we need in order to return to our puzzle is the so-called *associative* construction (Sabelo 1990, Halpert 2015, Jones 2018). Zulu marks a variety of adnominal dependents with a complex prefix consisting of two parts: a nominal concord that matches the head noun and a fixed *-a* vowel that predictably coalesces with the initial vowel of the noun it marks. In (10a), we can see the associative marking a possessor; in (10b), it marks a nominal modifier, and in (10c), it marks the internal argument of a low nomimalization, where the root *cabang* 'think' has been nominalized as a class 3 noun:

	- b. isiminyaminya aug.7swarm *se*-mikhovu 7assoc.aug-4zombie 'a horde of zombies'
	- c. um-cabango aug.3thought *we*-mikhovu 3assoc.aug-4zombie 'the thought of zombies'

Multiple nominal modifiers can appear in the same noun phrase, each marked by a separate associative morpheme:

(11) isiminyaminya aug.7swarm *se*-mikhovu 7assoc-4zombie *so*-mthakathi 7assoc-1wizard 'the wizard's horde of zombies'

To summarize, the associative marks nominal adjuncts to a nominal, can occur multiple times within a single noun phrase, and is compatible with a range of semantic relationships. Pietraszko (2019) treats the associative in closely-related Ndebele as a nominal adjunct with concord. She analyzes the *-a* morpheme as a Linker head that takes the modifying nominal (or CP) as its complement and receives a copy of the phi (noun class) features of the head noun via a DP-internal concord process. On this view, multiple associative-marked nominals can easily modify a single head noun, with each attaching as a right-adjoined adjunct.<sup>7</sup>

<sup>7</sup> See Jones (2018), though, for an analysis of Zulu associative as a D head. It's not clear how such an analysis would account for the cases of multiple associative-marked modifiers.

#### **2.4 Interim summary**

To summarize what we have seen in this section, subject and object agreement have a tight correlation with word order: non-agreeing arguments remain in their base position inside vP, while agreement is required to track vP-external arguments. Movement of the external argument out of vP is*required* in order for internal arguments to be available for movement and agreement. Subject and object agreement contrasts with associative marking, which appears to be a concord process internal to the noun phrase that can mark multiple adnominal adjuncts. In the next section, we will return to the initial puzzle and see various ways in which these baseline expectations are not met in infinitives with overt agents. In the next section, we'll see some surprising ways in which overt subjects of infinitives depart from these baseline expectations.

## **3 The puzzle: Subjects in infinitives**

In the introduction, we saw that Zulu infinitives with overt subjects have two basic properties: rigid VSO word order and obligatory associative morphology on the subject. Given the baseline behaviors that we observed in §2, these properties alone raise questions about the underlying structure of infinitives with subjects. In this section, I will unpack these puzzles and discuss an additional puzzle raised by the behavior of object agreement in these constructions.

#### **3.1 Locating the associative-marked subject**

As we've seen in previous sections, class 15/17 *uku-* nominalizations have the distribution of infinitives and preserve internal argument structure, as illustrated in (12) below:<sup>8</sup>

(12) u-ku-saba aug-15-fear igundane aug.5mouse 'to fear mice/a fear of mice'

<sup>8</sup> I have seen limited cases where an internal argument can be marked with an associative, as in (i.a) but more often, the associative forces an external argument interpretation, as in (i.b):

<sup>(</sup>i) a. uku-bhubha aug.15-destroy kwe-zwe 15assoc.aug-5country 'the destruction of the country/to destroy the country' b. u-no-ku-saba 1sm-with.aug-15-fear kwe-gundane 15assoc.aug-5mouse

<sup>&#</sup>x27;S/he has a mouse's fear.' (same fears as a mouse, NOT a fear of mice)

#### Claire Halpert

When an overt external argument is present (here an experiencer, rather than an agent), it *must* be marked with associative morphology:

(13) Uku-saba aug.15-fear *kwa*-mi 15assoc-1sg.pro ku-khulu. 15sm-big 'My fear is big.'

This behavior of class 15 infinitives contrasts with nominalizations that involve other noun classes and that typically do not permit any preverbal morphology between the root and nominal prefix. In these low nominalizations, *all* arguments of the verb, including the external argument, must be marked with associative, as (14b) shows:

	- b. Isi-fiso aug.7-wish *sa*-kho 7assoc-2sg.pro *so*-ku-thola 7assoc.aug-15-get iziqu aug.8degree si-zo-fezeka. sm7-fut-come.true 'Your wish to get a degree will come true.'

In these low nominalizations where the nominals that correspond to the arguments of the root verb are marked with associative morphology, we see no evidence of c-command. For example, the external argument is unable to bind a pronoun inside the internal argument in (15):

(15) Isi-fiso aug.7-wish sa-wo 7assoc-1dem wonke 1.every umtwana aug.1person so-ku-bona 7assoc.aug-15-see uma aug.1mom wa-khe 1assoc-1pro si-zo-fezeka. sm7-fut-come.true 'Every child's wish to see her mother will come true.' (non-bound reading salient, speakers find bound reading difficult)

This lack of a bound reading is unsurprising on a view of associative adjunction like that of Pietraszko (2019), discussed in the previous section. If associativemarked nominals are always adjoined within the nominal phrase of the head that they modify, then both of the "arguments" in (15) would be right-adjoined and we would not expect the first to c-command the second.

In an infinitive with an associative-marked subject, we might expect the structure to similarly involve adjunction within the nominal domain, above the level of verbal structure associated with the root. If so, we first make a prediction about word order that we have already seen does not hold: an associative-marked subject should *follow* any unmarked internal arguments that are introduced in the verbal domain, contrary to what (16) shows:

(16) a. [U-ku-nikeza aug-15-give *kwa-khe* 15.assoc-1pro izingane aug.10child amavuvuzela] aug.6vuvuzela ku-ya-ngi-casula. 15sm-dj-1sg.om-annoy 'His giving the children vuvuzelas annoys me.' b. \* [U-ku-nikeza aug-15-give izingane aug.10child amavuvuzela aug.6vuvuzela *kwa-khe*] 15.assoc-1pro ku-ya-ngi-casula. sm15-dj-1sg.om-annoy Intended: 'His giving the children vuvuzelas annoys me.'

If we maintain the assumption that the associative-marked subject involves nominal adjunction, then the word order illustrated by (16), where internal arguments must appear to the right of the subject, would have to involve rightadjunction of these internal arguments in the nominal domain as well, similar to the low nominalization cases in (14) and (15). Given the lack of c-command in (15), we would predict that an internal argument that *follows* the subject in an infinitive would also not be c-commanded by it. Again, the prediction of the adjunction hypothesis is not met. As (17) illustrates, the associative-marked subject of an infinitive *can* bind into the following (non-marked) internal argument:

(17) Uku-nikeza aug.15-give kwa-wo 15assoc-1dem wonke 1.every umuntu aug.1person intombi aug.9girl isithombe aug.7picture sa-khe 7assoc-1pro ku-thatha sm15-take isikhathi. aug.7time 'For everyone to give the girl his picture takes a long time.'

The basic conundrum: neither the word order nor the binding facts fits with an adjunction picture for the subjects of infinitives. Instead, what we've seen in this section is that in infinitives, we find rigid VSO word order and evidence that S c-commands O. As we saw in §2.1, those are precisely the structural properties of in situ arguments in finite clauses. Based on what we've learned in this section,

#### Claire Halpert

then, I will suggest that, despite the presence of associative morphology, the overt subject in infinitives is simply in Spec,vP.

#### **3.2 Puzzling object agreement**

The hypothesis that the overt subject in an infinitive is in Spec,vP brings with it additional predictions. Recall from §2.1 that in finite clauses in Zulu, in situ subjects block objects from moving and controlling object agreement. This agreement blocking effect contrasts with the availability of object agreement in both finite clauses with agreeing subjects *and* infinitives with no subject (as we saw in §2.2). If overt subjects in infinitives are in Spec,vP, we expect a similar object agreement blocking effect. Unlike finite clauses with low subjects, however, infinitives with overt subjects permit object agreement, as (18) below illustrates:

	- b. [Uku-wa-nikeza aug15-6om-give kwakhe 15.assoc-1pro izingane] aug.10child ku-ya-ngi-casula. sm15-dj-1sg.om-annoy 'His giving them to the children annoys me.'

The full puzzle, then, involves not only the presence of associative morphology despite the VSO word order and c-command relationship between arguments, but also the availability of object agreement despite the presence of the overt postverbal subject. In the next section, I will explore a solution that links all of these properties.

## **4 Toward an analysis**

The word order and binding facts from the previous section suggest that the external argument in Zulu infinitives is *inside* the verbal part of the infinitival clause. If infinitives involve enough clausal structure to include the external argument, why is special associative marking required on that argument, but not on the low external argument of a finite clause? Furthermore, why does the associative marker not signal the type of adjunction structure that it appears to create when marking nominal modifiers? I'll turn first to this second question,

arguing that the associative here is plausibly a head in the verbal extended projection, along the lines of what has been argued for in Kinande by Baker & Collins (2006).

As Baker & Collins (2006) discuss, when multiple nominals appear in the postverbal field in Kinande, they must be separated by a so-called Linker, as illustrated in (19):

(19) Kinande (Bantu; Baker & Collins 2006: ex. 1) mo-n-a-h-ere aff-1sg.S-T-give-ext omukali 1woman *y'-* Lk.1 eritunda. 5fruit. 'I gave a fruit to a woman.'

The linker matches in noun class with the *preceeding* nominal but cliticizes to the following nominal. Baker & Collins (2006) argue that the Linker (Lk) is a head in the clausal spine between V and v that is involved in case-licensing. On their analysis, the Linker attracts either internal argument in a ditransitive like (19) to its specifier and agrees with that nominal. The verb undergoes headmovement to a position above the Linker head but does not need to move through Lk because Lk itself is not verbal (violating the Head Movement Constraint).

The possibility of Lk before a low subject suggests that LkP is perhaps a bit higher than Baker & Collins (2006) posit, at least above vP, in such cases, as illustrated by (20):

(20) Kinande (Bantu; Pierre Mujomba, p.c.) Esyóngwé aug.9wood si-ká-seny-ere sm9-T-chop-appl.pfv omo-musitu aug.18-3village *mo* Lk.18 bakali. 2women 'women chop wood in the village.'

For Schneider-Zioga (2015a,b), the Kinande Linker is not a case-licenser, but rather a copula that can be used to mediate predication relations within a verb phrase. She argues that it appears as a last-resort mechanism when multiple arguments remain in the post-verbal field.

I believe that certain insights of these accounts can apply to the puzzle of agents in Zulu infinitives. If the associative in Zulu infinitives is a Linker-like element, following Pietraszko (2019) for Ndebele, that marks true arguments of infinitives, as the binding data in §3.1 shows, then as in Kinande, it could be a head in the verbal extended projection that is "skipped" by head movement, along the lines of Baker & Collins (2006). Also like the Kinande Linker, it appears only when it is needed to "license" an external argument in an infinitive.

#### Claire Halpert

Why would the associative be required in infinitives with a low subject but not their finite counterpart? Here is a sketch of a potential analysis: suppose that Zulu infinitives lack some head that helps to license the subject of a finite clause; this would be a typologically common property of infinitives (vs. finite clauses). Likely candidates for such a head in Zulu could be Voice (or possibly Pred<sup>9</sup> ) or the locus of the conjoint/disjoint alternation, which I argue in Halpert (2015) helps to license the subject in a finite clause, but which does not appear in infinitives. In the absence of this relevant category, Zulu uses a Linker to mediate predication involving the external argument. As a copular element, Lk is not involved in verbal head movement (like Kinande). Unlike in Kinande, Lk in Zulu doesn't attract a specifier. Once the infinitive is constructed, it undergoes concord with the head, as Pietraszko (2019) argues for Ndebele. The presence of the Linker head on the external argument prevents it from being a phi-goal, which allows object marking to target lower arguments.

One reason to think that Voice might be relevant in licensing subjects in infinitives comes from parallels to passive constructions with overt agents: overt external arguments in Zulu passives appear in Spec,vP, marked with the *copula* (Halpert & Zeller 2016).

(21) Zulu (Halpert & Zeller 2016: ex. 3) USipho aug.1Sipho w-a-nikez-w-a sm1-pst-give-pass *w-uMary* cop-aug.1Mary incwadi. aug.9book 'Sipho was given a book by Mary.'

In both passives and infinitives, the subject appears immediately after the verb and before other vP-internal arguments. In both, the overt subject does not block lower arguments from controlling agreement, unlike in active or finite clauses. Both constructions morphologically mark the subject by something that normally looks like a head (copula, linker).

Halpert & Zeller (2016) hypothesize that the copula in these constructions is a head in the clausal spine that gets skipped by head-movement of the verb. We hypothesize that the appearance of this morpheme on the subject renders the subject a non-intervener for object agreement. Given the findings of Schneider-Zioga (2015a,b) that the Kinande Linker is a copula, the parallels between Kinande Linkers, Zulu passive subjects, and Zulu infinitive subjects seem even more striking. While Kinande realizes Linkers and copulas with the same morphology in a variety of situations, it is possible that the difference between morphological marking of the subject in Zulu passives and infinitives could depend

<sup>9</sup> See Zeller (2013).

on the ultimate category of the clause: in a clause that is ultimately verbal (passive), the copula appears; when the clause is ultimately nominal (infinitive), the associative marker and concord obtain.

## **5 Conclusion**

This paper presents an initial description and investigation of the syntactic properties of overt subjects of infinitive clauses in Zulu. I show that these constructions require VSO word order and associative marking on the subject and demonstrate that c-command relationships hold between the subject and lower arguments. I also show that the presence of an overt associate-marked subject does not prevent an object from controlling object agreement. I argue that the subject in these constructions is expressed in Spec,vP and sketch a proposal that might account for its puzzling properties.

One intriguing issue that any treatment of this phenomenon must contend with is the question of argument licensing: in a language (and broader language family) that doesn't show typical properties of (nominative) case licensing associated with finite T (Diercks 2012, Halpert 2015), what syntactic role does the associative marker play in this construction? What can we learn about argument licensing in Bantu languages from the parallels between associative, copula, and linkers discussed in §4 and cross-Bantu variation in how overt subjects of infinitives are expressed? The data and approach sketched in this paper lay out an avenue for systematic investigation of subject expression in infinitives and the behavior of copula and linker particles both within and across Bantu languages that will advance our understanding of the role that argument licensing plays in these languages.

## **Abbreviations**


## **Acknowledgments**

I am grateful to Mthuli Percival Buthlezi, Monwabisi Mhlophe, Mfundo Didi, Menzi Komo, and Thandeka Maphumulo for their assistance with Zulu data and

grammaticality judgments and to Pierre Mujomba for sharing his Kinande judgments. I am also grateful to Jochen Zeller for our ongoing discussions about passives that have informed this research and to the audience at ACAL 49 for useful feedback and discussion.

## **References**


# **Chapter 12**

# **Obligatory controlled subjects in Bùlì**

## Abdul-Razak Sulemana

Massachusetts Institute of Technology

The paper argues that despite the lack of morphological marking to distinguish between finiteness and nonfiniteness, such a distinction does exist in Bùlì. It also argues that unlike the nonfiniteness of the English type languages where nonfinite clauses take a null subject (pro), the nonfinite clauses of Bùlì obligatorily take overt pronominals. The fact that the controlled element is overt in the language, I argue, shows that phonetic nullness is not an inherent property of the controlled element.

## **1 Introduction**

Bùlì does not have overt morphological marking to systematically distinguish finite clauses from nonfinite clauses. As such, notions like these will appear not to be useful descriptive labels in the syntax and semantics of the language. This finite-nonfinite distinction is often manifested differently including the distribution of overt DPs and empty categories: finite verbs license overt DPs while nonfinite verbs cannot without special mechanisms. As an illustration, consider the paradigm in (1) from English. The external arguments of the nonfinite complements which are coindexed with a matrix argument have to be null.

	- b. Mary persuaded John [\*he/pro to buy a book]

The goal of this paper is twofold: first to argue that despite the lack of morphological marking to distinguish between finiteness and nonfiniteness, such a distinction exists in the language. Second, I argue that unlike the nonfiniteness of the English-type languages, the nonfinite clauses of Bùlì obligatorily take overt

pronominals which must be coindexed with a matrix argument. The rest of the paper is organized as follows: In §2, I present a brief background to this language. In §3, I present a discussion of the finite-nonfinite distinction in the language. §4 argues that with the exception of its overtness, the pronominal in the subject of the nonfinite clause must be controlled. §6 discusses and concludes the paper.

## **2 Bùlì**

Bùlì is a Mabia (Gur) language spoken in Sandema in the Upper East Region of Ghana. It has three dialects: Central, Northern and Southern. This paper concentrates on the Central dialect. It is a tone language with three contrastive tones: Low, Mid and High. It is also a noun class language with five singular classes and four plural classes built around the pronouns. Its basic clause structure is SVO. Temporal interpretation of a predicate is sensitive to the eventive/stative distinction in the language (tenseless). Unmarked eventive predicates have default past interpretation while their stative counterparts have present interpretation,<sup>1</sup> (2).

	- b. Asouk Asouk sèbì know Ajohn. John 'Asouk knows John.'

The data in (2) also eliminates the potential for analyzing the low tone on the verb as the past tense morpheme, since both predicates are marked with a low tone. I will therefore consider the low tone as a form of 3rd person agreement. In the next section, I will present various arguments to show that Bùlì meets the general conditions on the finite vs. nonfinite distinction since an adequate classification of some syntactic structures would not be achieved if such a distinction is not assumed.

<sup>1</sup>This is related to what are sometimes called factitive constructions which are attested crosslinguistically and in Haitian (Déchaine 1991), and Fɔ̀ngbè (Avolonto 1992) among others. Stowell (1991) also observes that bare eventive verbs have only a past reading while bare stative verbs are interpreted as non-past in what is called headlinese.

## **3 The finite-nonfinite distinction**

Since Bùlì is a tenseless (factitive) language, notions like finite-nonfinite will appear not to be useful descriptions in the syntax and semantics of the language. Contrary to this, I present four arguments/diagnostics that will distinguish between them. These diagnostics, I argue, bring out two different kinds of nonfinite clauses: In the first kind, which I call nonfinite obligatory control complement (nonfinite-OC) illustrated in (3–4), the pronominal subject of the embedded clause must be co-indexed with a matrix argument.

	- a. Asouk Asouk tìerì remember [\*(wà /\* ) 3sg dā buy gbáŋ]. book 'Asouk remembered to buy a book.'
	- b. Núrmà people.def.pl zèrì refuse [\*(bà /\* ) 3pl dā buy gbáŋ]. book 'The people refused to buy a book.'
	- a. Mí 1sg túlím turn Asouk Asouk zúk head [\*(wà /\* ) 3sg dā buy gbáŋ]. book 'I convinced Asouk to buy a book.'
	- b. Mí 1sg túlím turn núrmà people.def.pl zúk head [\*(bà /\* ) 3pl dā buy gbáŋ]. book 'I convinced the people to buy a book.'

In the second kind, which I call the nonfinite non-obligatory control complement (nonfinite-NOC) as illustrated in (5–6), allows a full DP in the subject of the embedded clause. There is further distinction between those that are not introduced by complementizers (5) and those requiring complementizers (6). The other differences between these constructions will be made clear as the discussion proceeds as the main reason for this section is to defend the finite-nonfinite distinction in the language.

	- a. Mí 1sg à-yā: asp-want Asouk Asouk dā buy gbáŋ. book 'I want Asouk to buy a book.'
	- a. Kù 3sg à-fɛ̄ asp-necessary ātī c Asouk Asouk dā buy gbáŋ. book 'It is necessary for Asouk to buy a book.'
	- b. Kù 3sg nālā good ātī c Asouk Asouk dā buy gbáŋ. book 'It is good for Asouk to buy a book.'

The first argument to consider for the finite-nonfinite distinction comes from the Low-tone (Agreement) on the verb. In finite clauses, a third person subject triggers a low tone (agreement) on the verb when there are no preverbal particles intervening between the subject and the verb (7). This is the case for all 3rd person arguments in matrix as well as embedded clauses for different DP including r-expressions and pronouns and regardless of the tone on the argument. Note that the embedded clauses of the nonfinite clauses bear mid tones (see examples (3–6).

	- b. Bí:ká child.def wa 3sg dà buy gbǎŋ. book 'S/he bought a book.'
	- c. Asouk Asouk pàchìm think wà 3sg dà buy gbǎŋ. book 'Asouk thought he bought a book.'

The second argument for treating the embedded clauses above as nonfinite clauses is based on the distribution of the future marker. In finite clauses, both matrix and embedded, the future marker is required for future interpretations. This is illustrated in (8).

	- a. Asibi Asibi àlí fut dā buy gbáŋ. book 'Asibi will buy a book.'

b. Asouk Asouk pàchìm think Asibi Asibi chūm tomorrow \*(àlí) fut dā buy gbáŋ. book 'Asouk thought Asibi will buy a book tomorrow.'

In contrast, the future marker is excluded from all the nonfinite clauses. The examples in (9) illustrate this point. The inability of the future marker to appear in nonfinite clauses reminds us of nonfinite clauses in Chinese which cannot take modals like *hui* 'will' (Huang 1989).<sup>2</sup>

	- a. Asouk Asouk sìak agree \*(wà /\* ) 3sg chūm tomorrow (\*àlí) fut dā buy gbáŋ. book 'Asouk agreed to buy a book tomorrow.'
	- b. Mí 1sg à-yā: asp-want Asouk Asouk chūm tomorrow (\*àlí) fut dā buy gbáŋ. book 'I want Asouk to buy a book tomorrow.'

The third argument for the finite-nonfinite distinction comes from subject questions. In-situ subject *wh*-questions in finite clauses require the obligatory presence of *àlì*-ali in the clausal spine (10).

	- a. Ká q wānā who \*(àlì) ali dā buy gbáŋ book a prt 'Who bought a book?'
	- b. Asouk Asouk pàchìm think ka q wana who \*(àlì) ali dā buy gbáŋ book a prt 'Who does Asouk think bought a book?'

Although it is generally possible to question the subject of a nonfinite-NOC complement (11a–11b), questioning the subject requires the obligatory absence of *àlí*. The ungrammaticality of example (11c) shows that it is not possible to question the controlled subject of the nonfinite-OC complement. Hence another difference between finite and nonfinite clauses.

	- a. Mí 1sg à-yā: asp-want ka q wana who (\*àlì) ali dā buy gbáŋ book a? prt 'Who do I want for him to buy a book?'

<sup>2</sup>Whether the future marker *àlí* in Bùlì is a modal or a tense marker is beyond the focus of this paper, however.


c. \* Asouk Asouk tìerì remember ka q wana who (\*àlì) ali dā buy gbáŋ? book

Is it possible that what we are questioning in (11a–11b) are arguments of the matrix predicates rather than subjects of the complement clauses as a result *àlí* is not required, since nonsubjects don't require an *àlí*. This is indeed a possible analysis especially for (11a), however, there is evidence that these arguments are subjects of the complement clauses and as such the absence of *àlí* cannot be attributed to questioning a nonsubject argument.

Bùlì employs resumptive pronouns in long distance extraction of a subject, (12a) but not an object, (12b).

	- b. (Ká) q bʷā what \*(ātì) ati fì 2sg pá:-chīm think Asouk Asouk dìgì: cook (\*bu) 3sg ? 'What do you think Asouk cooked?'

If the questioned arguments in (11) above are objects, they should pattern with object extraction and if they are subjects they should pattern with long distance subject extraction. As shown in (13) they pattern with long distance subject extraction by requiring a resumptive pronoun.

	- b. (Ká) q wānā who \*(ātì) ati nà:wà chief.def tè give síuk path \*(wà) 3sg (\*àlì) ali dā buy gbáŋ book a? prt 'Who did the chief give permission to buy a book?'

The final argument for the finite-non-finite distinction comes from n-word licensing.<sup>3</sup> It has been noted that NPIs and n-words differ in that NPIs can be licensed across the border of a clause, but n-words cannot. N-words in Bùlì are formed by reduplicating indefinite nouns, and they must always occur with negation regardless of their position and number.

<sup>3</sup> For more on NPIs see Zeijlstra (2017).

	- b. Wāi-wāi someone-someone \*(àn) neg1 dīg cook lām meat \*(ā). neg2 'Nobody cooked meat.'
	- c. Wāi-wāi someone-someone \*(àn) neg1 dīg cook jāab-jāab thing-thing \*(ā). neg2 'Nobody cooked anything.'

In Bùlì and other languages, including Italian and Hebrew, n-words can be licensed across the border of nonfinite clauses but not in finite ones, (15).

	- b. \* Asouk Asouk àn neg1 tīeri remember āsī c wà 3sg dìg cook jāab-jāab thing-thing \*(ā). neg2 'Asouk didn't remember that he cooked anything.'

I have shown in this section that the distinction between finite and nonfinite clauses holds in the language and that the complement clauses in (3–6) are indeed nonfinite. In the next section, I argue that the nonfinite clause in Bùlì requires a pronominal subject which covaries with the number and class of the matrix argument that it is coindexed with, and as such, despite its overtness, this pronominal shares all the properties of pro.

## **4 Obligatory controlled subjects**

In the previous section, I argued that certain clauses in the language are nonfinite. However, unlike the "regular" nonfinite clauses, the nonfinite clauses of Bùlì require an overt pronominal. In this section, I will argue that the pronominal in the embedded clauses of nonfinite-OC clauses is a subject and must be controlled by a matrix argument. As noted, the subjects of the nonfinite-OC clauses must be co-indexed with a matrix argument. In (16) the co-indexation is with a matrix subject and in (17), it is with a matrix object. Note that the pronominal also covaries with the number and class of the matrix argument it is coindexed with.

	- b. Núrmà people.def.pl bàŋ forget \*(bà /\* ) 3pl kpārī lock tóukú. door 'The people forgot to lock the door.'
	- b. Núr-wá man.def fὲ force bísáŋá children \*(bà /\* ) 3pl bāsī leave dēlā. here 'The man forced the children to leave.'

Although the subjects of these nonfinite clauses are overt, applying the diagnostics from Hornstein (1999), Landau (2013), and Williams (1980) for what are often called signature properties of pro, suggests that the overt pronominal behaves like pro except for its overtness.

First, like pro, and unlike pronouns, the subjects of these clauses must pick up their antecedents in the immediately preceding clauses, (18). That is, just like pro, and unlike a pronominal subject of a finite clause, the pronominal subject of the most embedded clause can only be *núrmà* 'the people' which is the subject of the immediately preceding clause. It cannot refer to the singular subject of the matrix clause, (18a). The referential facts are different when the most embedded clause is a finite clause. As shown in (18b), the pronominal subject can freely refer to the subject of the matrix clause.

	- a. Asouk Asouk nỳa realize āsī c núrmà people.def.pl tìeri remember \*wà /bà 3sg/3pl dā buy gbáŋ. book 'Asouk realized that the people remembered to buy a book.'
	- b. Asouk Asouk nyà realize āsī c núrmà people.def.pl wèin say āyīn c wà /bà 3sg/3pl dà buy gbáŋ. book 'Asouk realized that the people say that he bought a book.'

Second, non c-command coreference of this pronominal is not possible, (19). The antecedent of a pronominal subject in nonfinite clauses must c-command it, just like pro (19). In (19a), *Asouk* cannot be the antecedent of the pronominal subject because it doesn't c-command it. On the contrary, in finite clauses this restriction does not hold (19b).

	- a. Asouk Asouk dóamà friend.def.pl bàŋ forget \*wà /bà 3sg/3pl kpārī lock tóukú. door 'Asouk's friends forgot to lock the door.'
	- b. Asouk Asouk dóamà friend.def.pl pàchìm think wa /bà 3sg/3pl kpàrì lock tóukú. door 'Asouk's friends thought he locked the door.'

In ellipsis contexts, the pronominal must be construed sloppily. In example (20) which involves a finite complement, the pronominal could be construed strictly or sloppily. In the strict reading, Asouk was the first to say that he bought a book before Asibi said he (Asouk) bought a book.

(20) Finite clause: the pronominal is ambiguous: strict or sloppy Asouk Asouk wìen say wà 3sg dà buy gbáŋ book àlēgē before Asibi Asibi wìen wà/ dà gbáŋ. say 3sg buy book 'Asouk said he bought a book before Asibi said that he bought a book.'

In contrast, in the nonfinite case (21), the pronominal must be construed sloppily. In (21), Asouk was the first to agree to buy the book before Asibi also agreed to buy a book.

(21) Non-finte clause: the pronominal must be construed as sloppy Asouk Asouk sìak agree wa 3sg dā buy gbáŋ book àlēgē before Asibi Asibi sìak wa\*/ dā gbáŋ. agree 3sg buy book 'Asouk agreed to buy a book before Asibi agreed to buy a book.'

Another observation is that pro in OC environments is interpreted as a bound variable, i.e it must be bound by the controller. This results in the difference in interpretation between (22a) and (22b). While the pronoun in the nonfinite complement is limited to the bound variable reading in (22a), the pronoun in (22b) is not.

	- a. Wā:-wāi someone-someone àn neg1 tīeri remember wà/\* 3sg dā buy gbáŋ book a. neg2 'No one remembered to buy a book.'
	- b. Wā:-wāi someone-someone àn neg1 wēn say wà/ 3sg dā buy gbáŋ book a. neg2 'No one said that he bought a book.'

Finally, as observed by Chierchia (1989), infinitival controlled constructions are always *de se*. The pronominal subject in these complements must be *de se.* This reading arises when the controller/antecedent is the subject of an attitude predicate and is aware that the complement proposition pertains to him/herself. In any situation where the attitude holder mistakes the embedded subject as someone other than him/herself, the pronominal cannot be truthfully used.

Consider the following Scenario: An old man (Asouk) is listening to the credentials of three people being considered for a chieftaincy title. Not knowing that the credentials of the second person mentioned refers to him (because he hardly remembers anything), he says to his wife 'this person should be given the title'.

In this scenario, (23) is false, an outcome expected if the pronominal is an instance of a lexicalized pro.

(23) Asouk Asouk à-zīentī eager wà 3sg chīm become nà:b. chief 'Asouk is eager to become a chief.'

It is important to note here that there have been reports in the literature that overt pronominal subjects are possible in controlled infinitives when they are focused (Szabolcsi 2009).<sup>4</sup> There is, however, solid evidence that the controlled pronominal subjects in Bùlì are not focus-marked, thus making it distinct from all the other cases identified where 'pro' is overt. Bùlì makes a distinction between weak and strong pronouns, with strong pronouns sometimes associated with focus. Weak pronouns usually have low tones. In controlled constructions, only the weak pronouns are acceptable (24a). The strong pronouns are grammatical only when they are modified by a scope bearing element like *also/too* similar to what Szabolcsi (2009) identified (24b).

	- b. Asouk Asouk sàik agree \*(\*wà /wá ) 3sg mɛ̄ also dā buy gbá.ŋ book 'Asouk agreed to also buy a book.'

Crucially, focus is not required to overtly express the subject. This indicates that overtness of the infinitival subject does not depend on focus in this language. Thus what we uncover here is not identical to the cases identified by Szabolcsi (2009) and others.

<sup>4</sup> See also Barbosa (2009) and Madigan (2008).

## **5 The pronominal is a subject**

In the previous section, I established that the overt pronominal in the nonfinite complement clause must be controlled. An alternative view is that pro is actually null as in other languages, and that this pronominal is an agreement marker found in nonfinite clauses similar to what we see in languages like Brazilian Portuguese. This alternative view, though attractive, faces a number of challenges. First, analogous agreement marking is conspicuously lacking in both finite and other nonfinite clauses (25). In finite clauses in both matrix and embedded contexts, repeating the pronominal as an agreement marker results in ungrammaticality (25a–25b). Similarly, repeating the pronominal in the nonfinite clauses that permit referential DPs as in (25c–25d) is also ungrammatical.

	- b. Asouk Asouk pàchìm think Asibi Asibi (\*wà ) 3sg dà buy gbáŋ. book 'Asouk thought Asibi bought a book.'
	- c. Mí 1sg à-yā: asp-want Asouk Asouk (\*wà ) 3sg dā buy gbáŋ. book 'I want Asouk to buy a book.'
	- d. Kù 3sg à-fɛ̄ asp-necessary ātī c Asouk Asouk (\*wà ) 3sg dā buy gbáŋ. book 'It is necessary for Asouk to buy a book.'

Second, claiming that this pronominal is agreement suggests that it is not in Spec of the embedded clause. However, the placement of adverbials in both kinds of clauses places the pronominal in the same location as matrix and embedded subjects, Spec,TP. The adverb *chúm* 'tomorrow,' follows the subject in matrix clauses whether they are referential (26a) or pronominal (26b).

	- b. Wà 3sg chúm tomorrow àlí fut dā buy gbáŋ. book 'He will buy a book tomorrow.'

Abdul-Razak Sulemana

In nonfinite clauses too, the adverb follows the pronominal (27). This shows that the pronominal is in Spec, TP just as in matrix subjects and that is not a clitic on the verb as one might assume.

	- b. Asouk Asouk à-yā: asp-want \*(wà/\* ) 3sg chúm tomorrow dā buy gbáŋ. book 'Asouk wants to buy a book tomorrow.'

Finally, the pronominal in the nonfinite clauses can be modified just like any subject, (28).

	- b. Asouk Asouk sàik agree \*(wá/\* ) 3sg mɛ̄ also dā also gbáŋ. buy book 'Asouk agreed to also buy a book.'
	- c. Asouk Asouk à-yā: asp-want \*(wá/\* ) 3sg mɛ̄ also dā buy gbáŋ. book 'Asouk wants to also buy a book.'

All these facts put together suggest that the pronominal is not an agreement marker or a clitic on the verb, but a real subject in Spec, TP.

This section has shown that the overt pronominal subject of the nonfinite complement is a subject and must be controlled by a matrix argument. Except for its overtness this pronominal shares the properties of pro, distinguishing it from the pronouns.

### **6 Discussions and conclusion**

The previous sections have established that Bùlì makes a distinction between finite and nonfinite clauses. Secondly, these nonfinite clauses require overt DPs in their specifier position. This conclusion raises a number of interesting questions for the various approaches to Control. I highlight these approaches and argue that the subjectless-based approach to control cannot be extended to Bùlì for obvious reasons. I will, however, leave open the option between the Agree-based model and the movement based model for future studies.

I group the approaches to Control into two:

	- i. Agree-based accounts Landau (2001, 2013) in which the relation between the matrix argument and the embedded subject, pro (a null nominal element distinct from a trace or copy) is established via an agree operation. On this view, pro is inherently null because of its association with infinitival T, which only assigns null Case (Chomsky & Lasnik 1993), and
	- ii. Movement-based account (Hornstein 1999) which considers the relation between pro and the matrix argument as involving movement. This approach accounts for the nullness of the subject by considering it as an unpronounced copy of the matrix argument.

In the previous sections, I have argued that the overt pronominal found in nonfinite complements under control shares all the properties of pro. The clear fact that nonfinite controlled complements surface with overt subjects raises interesting questions for theories of Control which deny the syntactic presence of a subject. Thus approaches to Control which take the lack of an overt subject in control complements as evidence for the lack of a subject, essentially arguing against the existence of pro, cannot be extended to Bùlì for obvious reasons. The fact that the controlled element is overt in Bùlì, I argue, shows that phonetic nullness is not an inherent property of the controlled element. Hence any approach to control that necessarily requires controlled elements to be null cannot be universal. The present data also presents a challenge for standard theories of DP distribution based on abstract Case. It has been standardly assumed that DPs are licensed in structural positions where Case assignment is possible. Subject DPs are assumed to get nominative Case from finite clauses. Since the complement clauses are nonfinite, the prediction of abstract Case theory is either that their subjects be null or that the DPs should be getting Case from elsewhere. An open question is thus how the overt pronominal is licensed.

## **Abbreviations**


## **Acknowledgments**

For helpful discussions, I thank İsa Kerem Bayırlı, Kenyon Branan, Suzana Fong, Sabine Iatridou, David Pesetsky, Norvin Richards, Michelle Sheehan, the audience of ACAL 49, and two anonymous reviewers. Any and all errors are my own.

## **References**


# **Chapter 13**

# **The pragmatics of Swahili relative clauses**

### Mohamed Mwamzandi

University of North Carolina at Chapel Hill

Several studies explain the variation of the Swahili relative clause (RC) from a syntactic perspective. These studies discuss the derivational and structural differences/similarities between the *amba* RC and the tensed RC. In this study, the choice between the *amba* and tensed RCs is explained from a pragmatic perspective. 440 RCs were extracted from the Helsinki Corpus of Swahili. The dataset was then coded for various variables including relative marker (*amba*/tensed), relative type (restrictive/non-restrictive), length (number of words used), and information status (topic/non-topic). The results show that the tensed RCs are mostly restrictive while the *amba* RCs are mostly non-restrictive. Further, the mean length of the *amba* RC is higher than of the tensed RC. It was observed that the *amba* RC is preferred in topic shift transition, that is, when a non-topic NP becomes the topic NP in the following utterance while the tensed RC is preferred in continue transition, that is, if the topic of the matrix clause is the same as that of the RC.

## **1 Introduction**

Several studies discuss the syntax of the Swahili tensed relative marker (RM) (1) and the *amba* RM (2). The goal of these studies is to draw a parallel between the two by explaining how the position of the RM is derivable from the same underlying position (Vitale 1981, Keach 1985) or occupies the same syntactic position (Demuth & Harford 1999, Ngonyani 2001, 2006). The examples presented in this study are from the Helsinki Corpus of Swahili (written texts in the standard Swahili), otherwise, citation will be given. The following notational conventions are followed: the RM is glossed as SRM (subject RM) or ORM (object RM) and the RC is bracketed.

Mohamed Mwamzandi. 2022. The pragmatics of Swahili relative clauses. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 227–246. Berlin: Language Science Press. DOI: 10.5281 / zenodo.6393758

#### Mohamed Mwamzandi


In (1) the SRM *ye-*, which agrees with the head noun *askari* 'policeman' in number and noun class is prefixed to the verb after tense while in (2) it is suffixed to *amba*. In Swahili and other Bantu languages, a predicate has a subject marker prefix that agrees with the subject in number and noun class and may also carry an object marker prefix. In (1), the subject marker prefix *a-* occurring in the verb initial position corresponds to the noun class 1 NP *askari* 'police' while the object marker prefix *ki-* occurring after the SRM *ye-* in (1) and after the tense marker *li*in (2) corresponds to the noun class 7 NP *kikosi* 'squad'. I should mention that there is yet another type of RM referred to as the general relative exemplified in (3).

(3) Askari 1.policeman a-ki-ongoz-a-ye 1.sm-7.om-lead-fv-1.srm kikosi. squad 'The policeman who is leading the squad.'

The SRM *-ye* in (3) is suffixed to the verb after the final vowel. Time is unspecified in the general relative and it is used in cases interpreted as present or habitual (Ashton 1944). In this study, I explain the variation of the tensed RC and the *amba* RC from a pragmatic perspective and leave the general relative for future research. "Pragmatics" is defined as "the systematic study of meaning by virtue of, or dependent on, the use of language" (Huang 2007: 2). Notice that the tensed and the *amba* relative clause in (1) and (2) are grammatical and are truth conditionally equivalent. The aim of this study is to give a pragmatic account of the variation. I should mention that most of the explanations for the variation of the Swahili *amba* and tensed RCs in this study are not absolute but statistical tendencies and should be considered in their entirety rather than individually.

Based on a few selected examples from texts, earlier studies have made claims that the *amba* RM is required in non-restrictive RCs, but this is yet to be confirmed using a large dataset (Ashton 1944, Schadeberg 1989). A restrictive RC as illustrated in (5) delimits the referent of an NP (Andrews 2007: 206). On the other hand, a non-restrictive RC as illustrated in (4) makes a comment about an NP or another constituent without delimiting its referent.


In English, the restrictive clause in (4) allows the occurrence of the complementizer *that* but the non-restrictive clause in (5) does not. It is also possible to suppress the relative pronoun in a restrictive RC as in: *The man I saw yesterday left this morning*, but not in a non-restrictive RC (Comrie 1989: 139). Further, as indicated by the comma before and after the RC, the non-restrictive clause as seen in (5) is set off intonationally from the main clause. Although the dataset in this study shows that the *amba* relative is mostly used in non-restrictive RCs, both the tensed and *amba* RCs can be used restrictively as well as non-restrictively. Thus, the restrictive and non-restrictive variable alone may not explain the Swahili *amba* and tensed RC variation. In addition to being used in non-restrictive RCs, it has been claimed that the *amba* RC when contrasted with the tensed RC is morphologically more versatile; its word order more flexible; and its structure more complex (Schadeberg 1989, Russell 1992).

I explore via corpus analysis the following questions:


I should mention that cross-linguistic studies have mostly analyzed complexity of RCs resulting from movement, locality, intervention and feature similarity as well as word order expectation (see, for example, Durrleman et al. 2016, Rizzi 2013, Levy et al. 2013). Because of the difficulty in measuring complexity in written texts whose verification needs processing tests performed on speakers, this study investigates the effect of RC length on the Swahili *amba* and tensed RC variation. It is assumed that longer sentences have more phrases, clauses and optional adjuncts that require subordination and coordination (cf. Hemforth et al. 2015, who investigate the effect of length on high attachment (first noun modification) and low attachment (second noun modification) in the processing of relative clauses with two possible antecedents in English, Spanish and French).

3. Whether the grammatical role (subject/object) of the head noun in the matrix clause and information status of the head NP, specifically topic, impacts on the type of RC used.

There are some parameters that may help in identifying the topic in an utterance. These include grammatical role, pronominalization, and linear order of discourse entities. I posit that the notion of topic, defined as what the proposition is about (Gundel 1985, Lambrecht 1994), play a role on the choice of one form of the RCs over the other. Subjects are mostly topics while objects may be part of a comment in a topic-comment utterance structure. Based on qualitative and quantitative analysis of the dataset, I claim that the *amba* RC is preferred if an NP in a previous utterance is the topic of the RC in a shift transition while the tensed relative is preferred if the most salient NP (topic) of the matrix clause is also the topic of the RC.

The rest of the paper is organized as follows. In §2 I discuss previous work that explains the variation of the Swahili *amba* and tensed RCs. In §3 I briefly explain the methodology. I present the results of the study and discussion in §4 followed by the conclusion in §5.

## **2 Previous studies on the Swahili RC variation**

The Swahili RC variation has mostly been attributed to morphosyntactic restrictions on the tensed RM. While the *amba* RM may be used with all the tense, aspect and modality (TAM) markers, the tensed RC may only be used with the past tense marker *li,* present tense marker *na*, the future tense marker *taka*, and the *si* negation marker (see Keach 1985 for a detailed description of TAM markers that require the use of *amba*). Due to the verb status of the tense markers and *amba* from a diachronic perspective, it has been argued that the two Swahili RCs are structurally the same (Vitale 1981, Keach 1985, Demuth & Harford 1999, Ngonyani 2001, 2006). A subject postposing rule in object RCs have been used as evidence to argue for the same syntactic position of the Swahili RM. Example (6) and (7) show the *amba* and tensed object RC respectively.


The prefix *a* attached to the verb *ongoza* 'lead' corresponding to the noun class 1 NP indicates that *kisaka* is the subject of the RC while the RM *cho* corresponding to the noun class 7 NP *kikosi* indicates that the object NP marked by the prefix *ki* before the root of the verb is the relativized NP. In (6) the subject, *kisaka,* is preverbal but postverbal in (7). Notice that the head noun *kikosi* 'squad' occurs before *amba/*tense (*li*). Keach (1985) argues that the tensed RM and the *amba* RM are both independent words (verbs) and that nothing can intervene between the head noun and *amba*/tense, hence, the postverbal position of the subject in the tensed object RC in (7). However, the structural similarities/differences may not explain why one of the RCs may be chosen instead of the other in discourse texts.

Russell (1992: 125–126) argues that *amba* is used in object RCs to disambiguate the subject and object NPs if they both belong to the same noun class as illustrated in (8) and (9).


The NPs *mwivi* 'thief' and *mtoto* 'child' agree in number and class with the verb *a-li-mu-ona* and can therefore be both interpreted as subject/object in (8) since Swahili allows both the SVO and OVS word order. The use of *amba* licenses the occurrence of the subject in its canonical preverbal position as seen in (9) and therefore enhances the interpretation of *mtoto* 'child' as the subject of the RC. However, the disambiguation justification for the choice of *amba* is limited in its application since cross-reference can establish the subject and object grammatical roles in situations where the two belong to different noun classes as seen in (10).

(10) Mwivi 1.thief wa-li-ye-mu-ona 2.sm-pst-1.orm-1.om-see watoto. 2.children 'The thief who the children saw.'

In (10), the subject of the RC is *watoto* 'children' because the subject marker *wa* on the verb agrees with the noun class 2 NP *watoto* 'children'. Furthermore, as was mentioned by an anonymous reviewer, although *amba* is used in (11) the subject of the sentence remains ambiguous if the NP *mtoto* 'child' occurs after the verb.

(11) Mwivi 1.thief [amba-ye amba-1.orm a-li-mw-ona 1.sm-pst-see mtoto]. 1.child 'The thief whom the child saw.'

Considering examples such as (10) and (11), the choice of the *amba* relative may not be necessarily to disambiguate the subject/object grammatical roles. Furthermore, the grammatical role (subject/object) and function (agent/theme) of discourse entities belonging to the same/different noun classes can be identified if adequate contextual background is available. In texts, discourse entities are linked via referring expression and therefore parameters such as linear order of utterances, subjecthood and pronominalization can help identify the intended referents as well as their grammatical role and information statuses (cf. Grosz et al. 1995 and Grosz & Sidner's 1998 Centering Theory, which attempts to determine topic in discourse texts).

In Swahili, it has been claimed that the *amba* RC is preferred if the RC in question is complex as illustrated in (12) and (13).


Following Keenan & Comrie's (1977) Accessibility Hierarchy of relativization, it has been claimed that the *amba* RC in (12) is preferred instead of the tensed RC in (13) since the relativized NP is the object (which is higher in the Accessibility Hierarchy than the subject). Based on the dataset used in this study, I argue that the choice of the *amba* RM in (12) is influenced by the length of the RC rather than the position of the head noun in the Accessibility Hierarchy. Notice that the RC in (12) contains two clauses with 8 words. The *amba* relative clause may be used by the speaker to "compensate for (perceived) difficulties in producing or parsing" long relative clauses, whether subject or object RCs (Green 1996: 133).

In these syntactic studies on Swahili RC, the structural differences/similarities are the focus and their usage by speakers is regarded as a case of free variation. However, a combination of pragmatic factors such as preference for the tensed RC in restrictive RCs, length of the RCs, and information structure considerations may better explain the choice of one of the RC variants in natural language.

### **3 Methodology**

The source of data in this study was the Helsinki Corpus of Swahili (HCS) which has 25 million words. The corpus contains Swahili newspaper articles as well as excerpts of literary texts, education and science material written from the mid-20th century to 2015. In the HCS, concordance searches are done via an inbuilt software, namely *Korp*. *Korp* displays a concordance list of the query as well as the immediate context of the search expression. Every word in the Helsinki corpus contains information related to its parts of speech (for example; verb, noun, adjective), morphological description (for example, relative marker, reciprocal, tense), syntactic function (for example; subject, main verb, object), and gloss in English. The tensed RCs were displayed using an extended search that instructed *Korp* to look for words whose parts of speech was V (verb) and its morphology included REL (relative marker). The *amba* RCs were displayed using a search that queried for all occurrences with *amba* as the base form. Only TAM markers that allow the occurrence of both the *amba* and tensed RCs were selected. Further, the RCs were selected such that they represent the diverse sources in the corpus which include novels by different authors, newspapers, and academic articles on literary works. In total, 440 RCs were selected from the HCS; 285 were tensed RCs and 155 were *amba* RCs.

Recall that *Korp* displays the target utterances in their immediate context. The utterances that occur within the context of the target RC provide contextual clues that help in the pragmatic analysis of the Swahili RC variation. Each of the 440 RCs was coded as restrictive or non-restrictive. As explained earlier, a restrictive relative gives specific information that delimit a referent while a non-restrictive clause gives more information (non-identifying) about a referent. The number of words within each RC was then counted to measure its length. After identifying the discourse entities in the target utterance and those occurring before and after it the RC was coded for relativization (subject/object), the grammatical role of the head noun in relation to the matrix clause (subjet/object/PP compliment) and information status of the head NP (topic/non-topic).

## **4 Results and discussion**

The study aimed at investigating pragmatic reasons for the variation of Swahili *amba* and tensed RCs. I discuss the results in the following order. In §4.1, I discuss the role of restrictive/non-restrictive on the *amba* RC and tensed RC variation. In §4.2, I discuss the effect of the RC length on the choice of the *amba*/tensed RC. In §4.3 I show that information structure, hitherto unexplored, may also impact on the Swahili RC variation.

#### **4.1 Variation due to restrictive/non-restrictive use of RCs**

If the variation of the *amba* and tensed RCs is due to the restrictive and nonrestive use of the RCs, I expect the dataset used in the study to have a higher frequency of *amba* non-restrictive RCs than the tensed non-restrictive RCs. On the other hand, I expect a higher frequency of tensed restrictive RCs than *amba* restrictive RCs. Of the 155 *amba* RCs, 32 were restrictive and 123 were nonrestrictive. As for the 285 tensed RCs, 245 were restrictive and 40 were nonrestrictive. Figure 1 summarizes these results.

Figure 1: *amba* and tensed restrictive and non-restrictive clauses

A chi-squared test shows that the frequency differences between the restrictive and non-restrictive clauses for the *amba* and tensed RCs is significant 2 (1, = 440) = 180.89, < 0.001. The implication is that the tensed RC is used more frequently as restrictive while the *amba* RC is used more frequently as non-restrictive. I present an example of a tensed restrictive RC in (14).

(14) Bwana 1.man yule that [a-na-ye-sinzia]. 1.sm-prs-1.srm-doze 'That man (over there) who is dozing.'

In (14), the speaker points at a specific man who was dozing amongst other men. The RC gives information that help in delimiting the referent of the referring expression *bwana yule* 'that man'. Although the tensed RC was mostly used restrictively, there are a few instances of tensed non-restrictive RCs illustrated in (15).

(15) A-li-m-rud-i-a 1.sm-pst-1.om-return-appl-fv Adili 1.Adili [a-li-ye-kuwa 1.sm-pst-1.srm-aux a-na-pata 1.sm-prs-get fahamu]. 9.consciousness 'He got back to Adili, who was regaining his consciousness.'

The RC in (15) does not delimit the referent but gives more information about the topic of the RC, *Adili*. A close analysis of the non-restrictive tensed RCs shows that they mostly occur if the referent of the head noun is a proper noun as is the case in (15). There are also cases when a non-restrictive tensed RC gives more information that establishes the identity of a proper noun, as seen in (16).

(16) Krapf Krapf [a-li-ye-li-tumikia 1.sm-pst-1.srm-5.om-serve dhehebu 5.denomination la of C.M.S]. C.M.S.] 'Krapf, who served the C.M.S Christian denomination.'

In (16), the head noun *Krapf* is the subject of the RC. The RC presents more specific information that may assist the hearer identify the intended referent.

While the dataset shows that the tensed RC mostly gives information that delimits the referent in restrictive RCs or gives identifying information in nonrestrictive usage, the *amba* RC adds new non-identifying information predicated on the head noun as seen in (17).

(17) A-li-kuwa 1.sm-pst-aux a-me-va-li-a 1.sm-pfv-wear-appl-fv rubega 9.cloak nyeupe white [amba-yo amba-9.srm i-li-anza 9.sm-pst-start ku-poteza inf-loose weupe whiteness wake]. its

'He was wearing a white cloak that had started to lose its whiteness.'

In (17), the RC gives more information about the head noun, *rubega nyeupe* 'white cloak', whose whiteness had faded. The RC in (17) may be analyzed as an utterance with a topic-comment structure rather than an NP in a coreferential relationship with its antecedent. Although the *amba* RC is mostly non-restrictive, there are instances when the *amba* RC may be used restrictively as seen in (18).

#### Mohamed Mwamzandi

(18) Mambo 6.things muhimu important [[amba-yo amba-6.srm ya-na-fanya]<sup>1</sup> 6.sm-prs-make [shairi 6.poem li-it-w-e 6.sm-call-pass-sbjv la of Ki-swahili]<sup>2</sup> ]. 7-Swahili 'Important things that make a poem be called a Swahili poem.'

In (18) the *amba* RC is restrictive because the head noun *mambo muhimu* 'important things' are defined by the RC as a set of characteristics that make a poem be called a Swahili poem. Notice that compared to the tensed restrictive clause presented in (14), the *amba* restrictive clause in (18) is longer. The tensed RC is one-word long (one clause) while the *amba* RC is six-words long (two clauses) and both are restrictive and subject RCs. I present the results of the effect of length on Swahili RC variation in §4.2.

#### **4.2 Length of the relative clause**

In this section, I discuss the effect of the RC length measured in number of words on Swahili *amba* RC and tensed RC variation. I claim that the *amba* RC is more frequently used if the length of the RC is long, while the tensed relative is used more frequently in short RCs as illustrated in (19) and (20).

	- b. Mwanamke 1.woman mjane 1.widow a-li-ye-kuwa 1.sm-pst-1.srm-be jirani 1.neighbor yao. their 'A (woman) widow who was their neighbor.'

'A (woman) widow who was their neighbor long ago.'

b. Mwanamke 1.woman mjane 1.widow a-li-ye-kuwa 1.sm-pst-1.srm-be jirani 1.neighbor yao their hapo long zamani. ago 'A (woman) widow who was their neighbor long ago.'

The RCs in (19a–b) are shorter than the RCs in (20a–b) because of the addition of the adverbial phrase *hapo zamani* in the later. Regardless of the grammatical role of the head noun, the dataset indicates that the *amba* RC was mostly used in cases where the length of the RC in question was long as seen in (20). I claim that the *amba* relative is used more frequently in such cases due to its simplifying effect in parsing or producing long RCs.

Table 1 presents the mean and standard deviation (in parenthesis) of the length of the *amba* RCs and tensed RCs (restrictive and non-restrictive).


Table 1: Mean and sd of RC length in restrictive and non-restrictive clauses

The mean length of the *amba* RCs is higher (about 7 words) than that of the tensed RCs (about 4 words). A non-repeated measures ANOVA with RC length as the independent variable, and RM (*amba*/tensed) and relative type (restrictive/non-restrictive) as the dependent variables reveal that there is a significant main effect of RC length on RM, (1, 440) = 197.49, < 0.001, and relative type (1, 440) = 11.11, < 0.001. These results indicate that the *amba* RC is preferred in long RCs while the tensed RC is preferred in short RCs.

In addition to its simplifying effect in parsing and production of long RCs, the preference for *amba* in such situations is also due to its ability to allow for more types of phrases to occur in different orders as illustrated by the adverbial phrase (AP) *kwa makusudi* 'deliberately' in (21–24).


#### Mohamed Mwamzandi

(24) Wa-po 2.sm-16.loc wananchi 2.citizens [wa-na-o-ziba [2.sm-prs-2.srm-block mifereji 4.taps kwa with makusudi]. deliberate]

In (21), the *amba* RC must be used because of the fronted AP *kwa makusudi* 'deliberately'. The use of the tensed RC with the AP intervening between the head noun *wananchi* 'citizens' and the RM in *wa-na-o-ziba* renders (22) ungrammatical. Both the *amba* RC and the tensed RC are grammatical if the adverbial phrase is post-verbal as in (23) and (24). The different word order with the AP in sentence final position may however change the intended effect of the utterance as stipulated by information structure considerations discussed in §4.3.

#### **4.3 Effect of grammatical role and topic**

I claim that in addition to the restrictiveness and RC length, the use of the *amba*/ tensed RC may also be influenced by information structure considerations. Information structure, a term originally introduced by Halliday (1967), deals with formal properties of semantically equivalent but pragmatically divergent propositions in their textual environment (Lambrecht 1994). The notion of topic, defined as what the sentence is about, has a role in the choice of Swahili RC constructions. Under information structure, utterances are organized into two parts. In the unmarked form, the first part, the topic, is given or presupposed information while the second part, comment, the predicate and its internal argument(s) and adjunct(s) is the new or informative part of the utterance (Prince 1981, 1992, Gundel & Fretheim 2006). To investigate the role of grammatical role and topic in the variation of the tensed and *amba* RCs, I coded the relativization of the head noun (subject/object/prepositional phrase (PP)). Based on its grammatical role and other contextual clues including pronominalization and referential chain (previous or later mention), the head noun was coded as topic/non-topic. This was possible because each utterance has exactly one topic which is the most salient member of the referents realized as pronoun, explicit referring expression, zero or "inferred" (as used by Prince 1981). Salience "defines the degree of relative prominence of a unit of information, at a specific point in time, in comparison to other units of information" (Chiarcos et al. 2011). The concept of topic as the most salient entity of an utterance is illustrated in (25).

(25) a. Nyumba 9.house [a-li-yo-kaa 1.sm-pst-9.orm-stay Beneti]. Beneti 'The house which Beneti stayed.'


In (25), there are three clauses labeled as a, b and c. In (25a), the subject of the RC is the postposed NP *Beneti* and the relativized NP *nyumba* is the object. Although it is possible to use the *amba* RM in (25a), I argue that the tensed RC is chosen so that the subject NP *Beneti* is postposed to place *nyumba* 'house' as the sole preverbal NP to make it easier for the hearer to parse *nyumba* as the topic of the matrix clause. The head noun *nyumba* 'house' is the subject of the utterance in (25) and therefore occurs in (25b–c) as a pronominal subject prefix attached to the verb. The most salient information unit of the matrix clause is therefore the house because of its occurrence as the sole NP in preverbal position and its pronominalization.

A restrictive clause is analyzed as a modifier of the head noun and therefore an element within a noun phrase structure of a subject, object or PP complement. On the other hand, a non-restrictive clause is analyzed as an independent clause with its own locus for topic update. The head noun may be the topic/non-topic of the RC in question. While (25) shows a head noun, *nyumba* 'house', that is the topic of the matrix clause, (26) shows a head noun that is non-topic.

(26) Adili 1.Adili a-li-vunja 1.sm-pst-break amri 9.command [a-li-yo-p-ew-a]. 1.sm-pst-9.orm-give-pass-fv 'Adili broke the command he was given.'

In (26), the head noun *amri* is the object of the matrix clause and together with the RC is an internal argument of the verb *vunja* 'break'. The topic of the matrix clause is the subject NP *Adili* in the left periphery. The occurrence of the NP *Adili* as a pronominal subject marker prefix within the RC provides further evidence for the topic status of the NP *Adili*. In instances with no topic shift such as (25) and (26), the tensed RC is more frequently used regardless of the grammatical role of the head noun. In both instances, the head noun is the object of the RC, an RC that is predicted to be complex by the Accessibility Hierarchy (Keenan & Comrie 1977) and therefore the *amba* RC more appropriate due to its simplification effects (Ashton 1944, Schadeberg 1989). Notice that the length of the tensed RCs in (25) and (26) are short (one word long with no subordination or coordination). I argue

#### Mohamed Mwamzandi

that the use of the tensed clause instead of the *amba* RC is due to the length of the RCs as explained in §4.2 and the information status of the head noun.

Table 2 shows the results of the effect of grammatical role and topic on the frequencies of *amba* and tensed restrictive/non-restrictive RCs. Row one, for example, shows the frequencies of the *amba* RCs and tensed RCs (restrictive and non-restrictive) when the head noun (HN) is the subject (topic) of the matrix clause and is also the topic of the RC in question.


Table 2: Effect of grammatical role and topic on Swahili RC variation

The frequency of head NPs that were the topic of the matrix clause and were continued as the topic of the tensed RCs was higher (70) than that of head nouns that were topics in the matrix clause as well as the *amba* RCs (28). The frequency difference is significant, 2 (1, = 98) = 18, < 0.001. Example (27) shows a tensed RC whose head noun is the topic of the matrix clause as well as the RC.

	- this

'The writers who followed have abandoned this style (of writing).'

In (27), the head noun *waandishi* 'writers' is the subject (topic) of the matrix clause and is continued as the topic of the RC via the subject marker *wa*. RCs that delimit the referent of a topic NP in a continue transition are mostly tensed. However, the *amba* RC may be used in long RCs even if the head noun is subject of the matrix clause and topic of the RC in question as discussed in §4.2.

There was a total of 101 RCs (65 *amba* and 36 tensed) whose head nouns were objects of the matrix clause but topic of the RC in question. On the other hand, there were 126 RCs (27 *amba* and 99 tensed) whose head nouns was object of the matrix but were non-topics within the RC utterances. The *amba* RC was used to enhance a shift transition in the event that the object NP became the topic of the RC as illustrated in (28).

(28) [Waandamanaji 2.demonstrators wa-li-kaidi 2.sm-pst-disobey amri 9.command hiyo]<sup>1</sup> that [na and ku-wa-rush-i-a inf-om-throw-appl-fv mawe 6.stones polisi]<sup>2</sup> 2.policemen [amba-o amba-2.srm wa-li-jibu 2.sm-pst-respond mapigo]<sup>3</sup> 6.beatings [kwa by ku-fyatua inf-shoot risasi 10.bullets za of moto]<sup>4</sup> . 3.fire 'The demonstrators disobeyed the command and threw stones to the policemen who responded to the beatings (of stones) by shooting live bullets.'

In (28), there are four clauses numbered 1–4. The topic of clause 1 and 2 is *waandamanaji* 'demonstrators' while the topic of clause 3 and 4, the *amba* RC, is *polisi* 'policemen', the object of the matrix clause. Since the NP *polisi* 'police' is mentioned for the first time as the object of the matrix clause in clause 2, then functions as the subject of the following clause, the *amba* RC, then it becomes the most salient NP in that clause due to its subject role and givenness (previously mentioned). The *amba* clause is preferred in a shift transition because of the topic status of the previously mentioned object NP to assist in parsing the RC as a separate utterance with a topic-comment structure. Although the frequency of the *amba* RCs used in shift transition is significantly higher than that of the tensed RCs, the tensed RC may also be used as shown in (29).

(29) Bali but Rehema 1.Rehema a-li-po-nyanyua 1.sm-pst-when-raise macho 6.eyes ku-m-tazama inf-2.om-look Bikiza 1.Bikiza [a-li-ye-leta 1.sm-pst-1.srm-bring taarifa 9.message ya of kifo]... 7.death 'But when *Rehema* raised her eyes to look at Bikiza who brought the meassage of death...'

In (29), *Bikiza* is the object of the matrix clause but the topic of the tensed RC. I mentioned earlier that the tensed non-restrictive RC occur in contexts where the head noun is a proper noun. This may explain the choice of the tensed RC here, but I should reiterate that the pragmatic explanations in this study are not absolute. Of course, they are statistical tendencies that cannot be attributed to chance.

41 head nouns occurring as PP complements within the subject/object NP of the matrix clause were RC topics while 68 were non-topics. The frequency difference between the *amba* and tensed RCs when the head NP was topic was insignificant, > 0.05. However, the frequency difference in non-topic head nouns was significant ( 2 (1, = 77) = 31, < 0.001). In (30), the *amba* RC is used in shift transitions, that is, the head noun is not the topic of the matrix clause but is the topic of the RC.

(30) A-li-anza 1.sm-pst-begin ku-safiri inf-travel ili so.that ku-epuka inf-avoid ghadhabu 9.anger ya of ndugu 1.sibling yake his amba-ye amba-1.srm a-li-u-kosa 1.sm-pst-11om-miss u-rithi 11-inheritance huo. that 'He started traveling to avoid the anger of his sibling who had missed the inheritance.'

In (30), the head noun *ndugu yake* 'his sibling' is the complement of the preposition *ya* 'of'. The subject of the matrix clause is a discourse entity occurring as subject pronominal NP *a* (The co-referential NP is not overt in the example.) while the subject of the *amba* RC is *ndugu yake* 'his sibling' occurring as a the SRM *ye* and pronominal NP *a* within the RC. The use of *amba* in (30) enhances the parsing of the RC as an utterance with its own locus for update rather than a delimiting phrase of the object NP. A close analysis reveals that tensed RCs whose head is topic of the RC but a PP complement of the matrix subject/object NP were mostly predicated by intransitive or passivized verbs as illustrated in (31) and (32).

(31) A-li-ji-ona 1.sm-pst-refl-see katika in nyumba 9.house kubwa 5.big sana very i-li-yo-kuwa 9.sm-pst-9.srm-be n-zuri 9-beautiful ajabu. surprisingly 'He found himself in a very big house that was surprisingly beautiful.' (32) Katika in hukumu 9.ruling i-li-yo-to-lew-a 9.sm-pst-9.srm-give-pass-fv jana yesterday na by Jaji 1.judge Bernard Luand.

Bernard Luand

'In a ruling that was given yesterday by Judge Bernard Luand.'

In (31) the subject/topic of the tensed RC is the NP *nyumba kubwa sana* 'a very big house' which is a complement of the preposition *katika* 'in'. In (32) the head noun is the NP *hukumu* also occurring after the preposition *katika* 'in'. As for the head nouns which were PP complements of subject/object NPs of the matrix clause, the dataset generally indicates that the tensed RC in non-topic head nouns is the most frequently used.

## **5 Conclusion**

A combination of factors including restrictive/nonrestrictive use of the RC, length of the relative, grammatical role of the head noun and information structure may explain the choice of one of the Swahili RC variants in cases that allow both the *amba* and tensed RC, specifically, when the past, present and future tense markers are used. The dataset suggests that all things being equal, the tensed RM is mostly used if the RC is restrictive while the *amba* RM is mostly used if the RC is non-restrictive. Further, the tensed RC can mostly be analyzed as a modifier of the head noun. On the other hand, the *amba* RC presents new information predicated on the relativized NP. The dataset also shows that the mean length of the *amba* RC is significantly higher than that of the tensed RC. The implication is that the *amba* RC contains more discourse entities and words in subordinated and coordinated structures. This calls for further processing and reading experiments to find out whether the *amba* RM is used in RCs to compensate for difficulty in production and parsing of longer utterances. Another factor that impacts on the choice of Swahili RCs is the grammatical role and information status of the head noun and other discourse elements within the RC. The *amba* RC is more frequently used in shift transition while the tensed relative delimits references of topic/non-topic discourse entities in continue transition. It is also possible that the tensed RC is preferred if the predicate is intransitive in shift transition, but this observation needs further research.

### **Source**

Helsinki Corpus of Swahili 2.0 (HCS 2.0). 2014-05-09. User support at CSC – IT Center for Science Ltd. The Language Bank of Finland (distributor). Etsin research data finder, 2018. http://urn.fi/urn:nbn:fi:lb-2014032624

## **Abbreviations**


## **References**


# **Chapter 14**

# **Unifying prolepsis and cross-clausal cliticization in Lubukusu**

## Lydia Newkirk

Rutgers University

This paper examines proleptic constructions in Lubukusu (Bantu, Western Kenya). I find that Lubukusu has two distinct strategies for prolepsis: one where the extra matrix nominal is base-generated high, and one where the nominal moves from the embedded clause to the matrix position. The latter is subject to island effects, whereas the former is not. I propose an analysis for these two kinds of prolepsis based on these facts, dependant on the particularities of what nominals can be licensed in what syntactic positions in Lubukusu, and explore the cross-linguistic implications of this analysis.

## **1 Background**

Before proceeding to the main description and analysis of Lubukusu prolepsis, it will be useful to briefly introduce both prolepsis as a phenomenon, and provide some preliminary background on Lubukusu as a language.

#### **1.1 Prolepsis**

Prolepsis, as characterized in Salzmann (2017), is a multiclausal construction in which a verb that normally takes an embedded finite clause apparently takes an additional nominal argument (the proleptic object) in the matrix clause, often accompanied by a preposition as in German (1), but sometimes licensed with case marking, as in Middle Dutch (2).

Lydia Newkirk. 2022. Unifying prolepsis and cross-clausal cliticization in Lubukusu. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 247–264. Berlin: Language Science Press. DOI: 10. 5281/zenodo.6393760


Proleptic objects have most typically been analyzed as base-generation of the proleptic object either in the matrix clause (as in Salzmann 2017) or in the left periphery of the embedded clause (as in van Koppen et al. 2016), with an obligatory coreference requirement between the proleptic object and some pronominal in the embedded clause. Prolepsis without such an embedded pronominal is degraded, as in the English example below:

(3) ?? Mary thinks of dinner that John will cook fish tonight.

#### **1.2 Lubukusu**

Lubukusu (Bantu JE31c, Western Kenya) utilizes a set of prefixes on its verbs to indicate the noun class of its subject (4).<sup>1</sup>

(4) Wekesa Wekesa *a*-a-kul-a sm.c1-pst-buy-fv sii-tabu. c7-book 'Wekesa bought a book.' (Wasike 2006: ex. 11a)

Lubukusu also has a set of object-marking prefixes, but in neutral contexts they cannot cooccur with an overt object, unless that object is a pronoun. This has led Diercks & Sikuku (2015), Sikuku et al. (2018) to analyze the object marker as an incorporated pronoun/clitic, rather than an agreement morpheme.

<sup>1</sup>Many of the Lubukusu examples in this paper is from the Afranaph Project. For those sentences I have marked their sentence ID for lookup in the Afranaph database. Other examples I have drawn from the Lubukusu literature, and are marked accordingly. Examples without an accompanying citation are from my own field work. I am indebted to Dr. Justine Sikuku for his patience and assistance by providing me with the data.

(5) N-a-*mu*<sup>i</sup> -bon-a 1sg.s-pst-om.c1-see-fv (#Wekesa<sup>i</sup> ). Wekesa 'I saw him.' (Diercks & Sikuku 2015: ex. 2)

The third person pronominal *niye can* cooccur with verbal object marking, however:

(6) Wekesa Wekesa a-a-*mu*<sup>i</sup> -p-a sm.c1-pst-*om.c1*-beat-fv (*niye*<sup>i</sup> ). *him* 'Wekesa beat him.' (Afranaph ID: 3734/5039)

This is in line with the generalizations in Anagnostopoulou (2016, 2017), in that even languages which do not allow clitics to double full DP objects allow doubling for overt object pronouns.

Lubukusu also marks reflexivity on the verb, where an invariant reflexive marker (refl) occurs in the same position as the OM. A pronoun which takes noun class agreement matching its antecedent may also cooccur with the refl, which surfaces as *i-* regardless of the noun class of its antecedent. The refl alone is sufficient to establish reflexivity, so the agreeing anaphor is optional. The Lubukusu refl is also analyzed as an incorporated pronoun in line with Lubukusu's object markers, given its similar syntactic behavior.

(7) Yòháná<sup>i</sup> Yohana á-á-*i* i -bon-a sm.c1-pst-*refl*-see-fv (o-mu-eene<sup>i</sup> ). c1-c1-own 'John<sup>i</sup> saw himself<sup>i</sup> .' (Afranaph ID:1248/1249)

The agr-*eene* pronoun can also occur without an accompanying refl, in which case it cannot take a local antecedent, but is allowed to take a discourse antecedent:

(8) Billi<sup>i</sup> Billi a-a-bon-a sm.c1-pst-see-fv o-mu-eenek/\*i. c1-c1-own 'Bill<sup>i</sup> saw himk/\*i.' (ID: 1367) (9) Jack<sup>i</sup> Jack a-many-il-e sm.c1-knows-tns-fv a-li c1-that George<sup>j</sup> George a-*mu*i/k-siim-a sm.c1-*om.c1*-like-fv *o-mu-eene*i/k. *c1-c1-own* 'Jack<sup>i</sup> knows that George<sup>j</sup> likes himi/k.'

These pieces in place, I now proceed to give a description of prolepsis in Lubukusu.

## **2 Prolepsis in Lubukusu**

In Lubukusu, there are three ways to license a proleptic object: first, a proleptic object can be introduced with a preposition (10), as is the case in English. Second, there is an equivalent construction with an applicative morpheme (11).


Third, it is also possible for a proleptic object to be a reflexive pronoun, in German, English, and Lubukusu, but crucially the Germanic cases still require that a preposition introduce the proleptic object, while in Lubukusu the preposition is optional (14):


(15) Jack<sup>i</sup> Jack a-*i* i -kanakan-il-a sm.c1-refl-think-appl-fv *o-mu-eene*i c1-c1-own a-li c1-that Lisa Lisa a-many-il-e sm.c1-know-tns-fv a-li c1-that Wendy Wendy a-mu<sup>i</sup> -siim-a sm.c1-om.c1-like-fv *o-mu-eene*i . c1-c1-own 'Jack<sup>i</sup> thought for himself<sup>i</sup> that Lisa thinks that Wendy likes him<sup>i</sup> .'

In (14), there is no agr-*eene* in the matrix clause, as the invariant refl suffices to mark reflexivity, though (15) demonstrates that agr-*eene* can occur both in the embedded clause and in the matrix clause. In (13) however, there is no refl on the matrix verb, and instead there is an overt proleptic object in the matrix clause, which does not participate in clitic doubling on the matrix verb, and has an (optional) embedded resumptive pronoun. Similar constructions are possible with a matrix (third person, non-reflexive) object marker rather than the reflexive marker, although it is degraded when the embedded object marker is in object position:


Constructions with *khu-mu-eene* in the matrix clause are insensitive to locality, whereas the construction with the refl/om cliticized to the matrix verb is sensitive to island boundaries, here shown with the refl on matrix verb:

(19) John<sup>i</sup> John a-lom-a sm.c1-say-fv *khu-mu-eene*i prep-c1-own a-li c1-that Bill Bill a-khaenj-a sm.c1-look.for-fv [o-mu-undu c1-c1-person o-wa-mu-lip-a wh-c1-om.c1-pst-pay-fv *o-mu-eene*i ]. c1-c1-own 'John<sup>i</sup> said about himself<sup>i</sup> that Bill is looking for the person who paid himself<sup>i</sup> .'


And similarly with the OM on the verb, embedding agr-*eene* inside of an island is degraded:

(25) \* John John a-a-mu<sup>i</sup> -lom-a sm.c1-pst-om.c1-say-fv a-li c1-that George George a-khaenj-a sm.c1-look.for [o-muu-ndu c1-c1-person o-w-a-mu-lip-a wh-c1-pst-pay-fv o-mu-eene]. c1-c1-own 'John said of him<sup>i</sup> that George is looking for [the person who paid him<sup>i</sup> ].'

(26) ? John John a-*mu*<sup>i</sup> -lom-a sm.c1-om.c1-say-fv a-li c1-that o-mu-eene<sup>i</sup> c1-c1-own a-rekukh-a sm.c1-leave-fv [paata ya after Mary Mary khu-mu-khuu-p-a c15-om.c1-c15?-hit-fv *o-mu-eene*i ]. c1-c1-own John said of him<sup>i</sup> that he<sup>i</sup> left after Mary hit him<sup>i</sup> .'

These correlate with the island/locality constraints for *wh*-movement in Lubukusu. The following are the corresponding island examples from Wasike (2006):


Based on the demonstrated island restrictions, I take the cliticization strategy to be movement of a pronoun from its argument position in the embedded clause to the matrix clause, and the applicative and prepositional phrase strategies to be base-generation of a pronoun or DP in the matrix clause. These same sentences are illicit without the appropriate embedded object marking, however:


The ungrammaticality of (31) is unsurprising, given the general island sensitivity of this construction. (30) shows that the embedded object marker is obligatory, a fact I will return to later. If the cliticization strategy is movement from the embedded clause to the matrix clause, I will have to explain why the embedded OM remains obligatory.

In summary, Lubukusu has three kinds of proleptic strategies:<sup>2</sup>


Three main characteristics that are common across these constructions:


I will conclude that characteristics 2 and 3 come about by the same process, and so I will consider them together. Characteristic 1 is a separate concern, so I will address it first.

## **3 Nominal licensing**

In analyzing the island-sensitive clitic-licensed prolepsis, I generally follow analyses of cross-clausal agreement in Polinsky & Potsdam (2001), Bruening (2001), Branigan & MacKenzie (2002). The embedded DP A′ -moves to to the embedded left periphery. In Lubukusu, that pronoun can then undergo further A′ movement to cliticize to the matrix verb. I follow the analysis of clitics as incorporated pronouns from Matushansky (2006), Baker & Kramer (2018), more specifically implemented in Lubukusu as in Sikuku et al. (2018).

On this analysis, (14) has the preliminary structure in Figure 1.

<sup>2</sup>An anonymous reviewer astutely observes that there are also a variety of embedded-clause strategies as well. These appear to be subject to the general constraints on object marking and pronominals in Lubukusu, which for reasons of space I will not explore here. The reader is referred to Sikuku et al. (2018) for more in-depth discussion of Lubukusu object marking.

(14) Jack<sup>i</sup> Jack a-*i* i -many-il-e sm.c1-refl-knows-tns-fv a-li c1-that George George a-mu<sup>i</sup> -siim-a sm.c1-om.c1-like-fv *o-mu-eene*i . c1-c1-own 'Jack<sup>i</sup> knows that George likes him<sup>i</sup> .'

Figure 1: Syntax of example (14)

The preposition-licensed and applicative-licensed cases, on the other hand, have a proleptic object that is base-generated in the matrix clause, introduced by a preposition or applicative, and then are related to the embedded pronoun by binding.

(11) John John a-kanakan-*il*-a sm.c1-think-appl-fv *Jane*i Jane a-li c1-that Bill Bill a-mu-siim-a sm.c1-om.c1-like-fv *o-mu-eene*i /*niye*<sup>i</sup> . c1-c1-own/her 'John thinks of Jane<sup>i</sup> that Bill likes her<sup>i</sup> .'

Figure 2: Syntax of example (11)

The movement strategy is restricted to pronouns due to independent facts about Lubukusu object marking. The object markers are clitics, and these clitics can only be doubled by pronouns, and not by full DPs:


In principle, a full DP could undergo movement to the matrix clause, but Lubukusu has no way of licensing it there by cliticization. There is no position for it to move to. At the same time, although prepositions can provide licensing to an additional matrix argument, they are not viable landing sites for movement, and so preclude movement of an embedded argument into their complement. The specifier of an applicative phrase is an eligible landing site for movement, but also for base-generation of a proleptic object, so island effects are obviated in the presence of an applicative morpheme.

I can now offer a tentative explanation for why the embedded OM remains obligatory even in the movement cases. The embedded pronoun begins by receiving a theta role in the embedded clause, but while it is then syntactically licensed in the matrix clause via cliticization, it is not semantically licensed there. So the embedded clitic contains information about where (and from what) the embedded pronoun received semantic licensing, while the matrix clitic contains information about its syntactic licensing in the proleptic construction. Since the two copies contain different information, they both must be pronounced.

Since the distinction between movement-based and base-generated prolepsis ultimately rests on the particular nominal licensing strategies in Lubukusu, we should expect cross-linguistic variation along the lines of what types of nominals can be licensed in what position, and what that licensing strategy is: that is, what provides a syntactically appropriate place for the proleptic object to inhabit.

### **4 Acquaintance relations**

There are still several questions left to address, however. The obligatory binding relationship between base-generated proleptic objects and the embedded pronoun is so far unexplained, as is the topic-like interpretation found for all three types of prolepsis.

An important fact on the way to addressing these issues is that proleptic objects must always be read transparently (Salzmann 2006, 2017).

	- a. # Bill thinks of Wayne<sup>i</sup> that he<sup>i</sup> is a spy.
	- b. Bill thinks that Wayne is a spy.

Saying that the embedded clause is "about" the proleptic object is not sufficient to account for this data. The matrix attitude holder has to *knowingly ascribe* the embedded predicate to the proleptic object, and properly identify the proleptic object as well.

The framework I will use as a starting point for these facts is from Speas & Tenny (2003). They propose a set of projections in the left periphery to account for various perspectival phenomena. The projections include a Speech-Act Phrase (SAP), Evaluative Phrase (EvalP), and Evidential Phrase (EvidP). The projections host various null nominals that have a perspectival semantics, and can both bind embedded pronouns and be bound by higher nominals to force coreference. A sketch of their left periphery is in (36).<sup>3</sup>

All of these positions are inherently perspectival, however. Accordingly, they won't work for a proleptic object (which doesn't even have to be sentient, much less a perspective-holder). But within their system, there is space to add one more position, for an *evaluated object*. Speas & Tenny derive an extended SAP by head movement of the speech act head. The same movement can apply to the

<sup>3</sup>The multiple instances of *sa*(\*) in the tree below are derived via head-movement.

evaluative head, creating an additional position for the evaluated object. Rather than having a perspective-taking semantics, the evaluated object can be nonsentient, so long as it is the object perceived by the seat of knowledge evaluating the embedded propositional content. This projection is parallel to the Hearer in the speech act projection, but for the lower EvalP head.

In base-generated prolepsis, the evaluated object binds the embedded agr*eene*, and in turn the evaluated object is bound by the proleptic object in the matrix clause. Therefore the modified tree for (11) is in Figure 3.

(11) John John a-kanakan-*il*-a sm.c1-think-appl-fv *Jane*i Jane a-li c1-that Bill Bill a-mu-siim-a sm.c1-om.c1-like-fv *o-mu-eene*i /*niye*<sup>i</sup> . c1-c1-own/her 'John thinks of Jane<sup>i</sup> that Bill likes her<sup>i</sup> .'

The movement-based prolepsis construction is much as it was before, but now we can pinpoint the left-peripheral location that serves as an escape hatch for the moved pronoun: it passes through the site of the evaluated object, and thereby receives its proleptic semantics. Then agr-*eene* moves further upward to cliticize to the matrix verb for its syntactic licensing.

Since both constructions involve the same projection in the left periphery, they get the same interpretation from the Eval head. Despite their disparate syntax, a common left periphery allows them to get the same semantics, one similar to topichood, though the proleptic object is not in a Topic projection in either case.

## **5 Cross-linguistic predictions**

Turning our attention to other languages, we can see that the difference between movement-based and base-generated prolepsis is how the nominal in the matrix clause is syntactically licensed, and whether that licensing position is eligible for movement or base-generation. For Passamaquoddy (Bruening 2001), Innu-Aimûn (Branigan & MacKenzie 2002), and Tsez (Polinsky & Potsdam 2001), agreement can reach to the CP domain and license the nominal there. But the nominal can only surface in the matrix clause if it is licensed by an agreeing matrix verb. If the verb surfaces in the non-agreeing (TI) voice, the nominal must stay in-situ, and there is no topicality:

Figure 3: Partial syntax of example (11)

Figure 4: Syntax of example (14)

#### Lydia Newkirk

	- a. N-uî-tshissenit-*en* 1-want-know-ti tshetshî if mûpishtâshkuenit visited-2/inv *kassinu* every *kâuâpikueshit*. priest 'I want to know if every priest visited you.'
	- b. \* N-uî-tshissenit-*en*<sup>i</sup> [*kassinu kâuâpikueshit*]<sup>i</sup> tshetshî mûpishtâshkuenit.

For Middle Dutch the matrix nominal is licensed by case marking, but on the analysis (van Koppen et al. 2016) it is in Spec,CP, although it has not been moved there. In German, prolepsis often feeds further movement that would otherwise be degraded (Salzmann 2017).

If prolepsis is used when A′ -movement is degraded, then it comes as no surprise that the proleptic object in those constructions would not be moved into that position, since movement out of the embedded clause is impossible in the first place. And similar to the base-generation strategy in Lubukusu, the complement of a preposition is not an eligible landing site for A'-movement. If German only licenses extra matrix clause nominals with a preposition, then those extra nominals will necessarily be base-generated there. Once again, the particulars of a given language condition which of the movement and base-generation strategies are available, and under which circumstances.

These considerations bring to the fore an important distinction between semantic and syntactic licensing. Semantically, the evaluated object head provides a viable semantic interpretation for the extra matrix nominal, so long as the context supports that interpretation. Thus the left periphery is identical in both types of structure. The syntactic licensing requirements, however, differ by construction (and by language), as independently established. It is precisely these syntactic facts that derive the differences between prolepsis types.

### **Abbreviations**

appl applicative c followed by a number noun class marker caus Causative fv Final vowel inv Inverse voice om Object Marker (typically followed by noun class number) prep Preposition refl Reflexive Marker sm Subject Marker (typically followed by noun class number) ti Transitive inanimate tns Tense

## **Acknowledgments**

Special thanks to Justine Sikuku, Ken Safir, Mark Baker, Vivian Deprez, and the attendees of Rutgers ST@R and SURGE reading groups. Some of the data for this project are from the Afranaph Project (NSF BCS 1324404). I am indebted to two anonymous reviewers for their helpful comments. Any remaining errors are my own.

## **References**


#### Lydia Newkirk


# **Chapter 15**

# **Tense and aspect in Akan serial verb constructions**

### Augustina Owusu

Boston College

In Akan, tense and aspect in serial verb constructions have different distributions. Tense is repeated on all the verbs; aspect only appears on the first verb, the following verbs have the *à* marker. In this paper, I argue that this difference in distribution is a consequence of being evaluated by different syntactic mechanisms. Evaluation of unvalued features is licensed by two syntactic mechanisms, Agree and Selection/Sel(ect)-Merge. In Akan, tense is evaluated by Agree and aspect by Selection. The -*à* morpheme that appears on the non-initial verbs in these clauses is the phonological realization of the morphosyntactic feature bundle [−prog, −fut]. Tense morphology depicts T-v Agree. The same tense inflection on all the verbs appears on verbs because one T-head parallel-Agrees with all of them. Since a single T-head cannot have different interpretations, it results in the matching restriction.

### **1 Introduction**

In this paper, I claim that both Agree and Selection mechanisms are necessary to explain the morphological distribution of tense and aspect in Akan Serial Verb Constructions (SVCs). In SVCs, tense morphology occurs on the first verb and all subsequent verbs. Aspect, on the other hand, occurs on the first verb, and all subsequent verbs take the *á* morpheme/consecutive marker (Dolphyne 1996, Osam 2003). I propose that the mechanism responsible for tense evaluation is Agree. There is a single T, which Agrees with all the verbs in SVCs relation, (Pesetsky & Torrego 2007). Aspect in Akan, on the other hand, is valued by *Selection*. Following the idea in Kandybowicz (2010, 2015) that aspect is merged within the vP in Akan, I posit two aspect projections in Akan: outer aspect, a projection

Augustina Owusu. 2022. Tense and aspect in Akan serial verb constructions. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 265–281. Berlin: Language Science Press. DOI: 10. 5281/zenodo.6393762

above the vP where aspect is interpreted, and lower aspect where the aspect is base generated. There is a compatibility/selection requirement between inner and outer aspect features. Since Selection is strictly between sisters, I adopt Webelhuth (1992)'s percolation principles to percolate inner aspect features to the sister of outer aspect.

The data and how the distribution of tense and aspect in Akan is compared to other related languages are discussed in the next section. Then in §3, I argue that Akan SVCs are covert coordination of vPs. The analysis of how tense and aspect have the particular distribution they do is discussed in §4. §5 is the conclusion.

### **2 The data**

In Akan SVCs, tense morphology occurs on all verbs. As shown in example (1), the past tense morpheme is present on the verb *throw*, and the verb *go*. However, not all tense sequences are allowed. For instance, example (1b) with past tense on the first verb and present tense on the second verb is ungrammatical. Also, the sequence, tense on the first verb, aspect on the second verb, is ungrammatical, as illustrated by (1c).

	- a. V(Pst) V(Pst)

Kofi Kofi to-*o* throw-pst boɔ stone kɔ-*ɔ* go-pst dan room no def mu. PostP 'Kofi threw a stone into the room.'

b. \* V(T (Pst) V(T(Present))

Kofi Kofi to-o throw-pst boɔ stone kɔ go.pres dan room no def mu. PostP

c. \* V(Pst) V(Asp) Kofi Kofi to-*o* throw-pst boɔ stone *re*-kɔ prog-go dan room no def mu. PostP

Based on the data above, there appears to be a restriction in the grammar that requires that SVCs must have the same tense specification.

Aspect in these SVCs has a different set of restrictions. Only the first verb can have actual aspect morphology, i.e., the morphology that shows up in simple clauses. All subsequent verbs have the *à*- morpheme, which Dolphyne (1996)

<sup>1</sup>All unattributed examples are from the author, a native speaker of the Asante Twi dialect of Akan.

and Osam (2003) refer to as the consecutive marker as in example (2). In (2a), *throw* occurs with the prefix *re-*, which marks progressive aspect, but *go* occurs with the *à*- morpheme. Unlike tense, having the same aspect on all the verbs is ungrammatical, as in (2b). Also, as shown in example (2c), aspect on *throw*, and past tense on *go* is also ruled out.


The *à*- morpheme is ambiguous between the perfect marker and the consecutive marker. In (3), for instance, the sentence is grammatical if *à*- is interpreted as perfect, but ungrammatical if interpreted as the consecutive morpheme.

(3) Kofi Kofi à-da. perf/\*cons-sleep 'Kofi has slept.'

The distribution of tense and aspect in Akan SVCS, especially the multiple occurrences of tense on all verbs, distinguishes Akan from related languages such as Ewe. In Ewe SVCs, tense and aspect are only marked once, a non-affix that precedes all the verbs, illustrated in the example below from Collins 1997.

(4) Ewe (Collins 1997)

Ekpe rock *a* fut ʄ hit o cup kɔpo go yi room-in xɔ-me.

'A rock will hit a cup into the room.'

'A rock will hit a cup, and the cup will go into the room.'

Tense attaches to neither of the verbs. The distribution of the tense morpheme in Ewe is in assonance with Baker & Stewart's (2002) observation that in most SVC languages there is no feature checking between T and v, and thus no tense morphology on the verb.

In the next section, I argue that what have hitherto been termed SVCs in Akan are covert coordination of *v*Ps.

### **3 SVCs as coordinate constructions**

Pace Martin (2011), I argue that Akan SVCs involve covert coordination of little *v*Ps. I assume that subjects are merged in voiceP (Kratzer 1996). As such, there is only one subject position and tense position in these constructions. The presence of a single subject position accounts for the fact that there can only be one subject position. Further, the single tense position explains the tense restriction. Though there is a lot of disagreement in the literature on what qualifies as SVCs, they are generally assumed to have the following features: they contain at least two (main) verbs in what appears to be a single clause, (Veenstra 1993): they have only one (phonologically non-null) subject and no overt subordination or coordination markers (Jansen et al. 1978, Sebba 1987). Verbs in SVCs typically have the same specification for tense, mood, aspect, and polarity, (Baker 1989) and (Collins 1997). The disagreement on SVCs also extends to the right kind of analysis for SVCS. Analyses of SVCs can be divided into three broad categories: complementation (Baker 1989, Baker & Stewart 2002, Collins 1997, Aboh 2009), adjunction (Baker & Stewart 2002), and and coordination (Agbedor 1994). The present proposal falls under the coordination account. Akan covert coordination (SVCs) have the structure in Figure 1.

Following Kandybowicz (2010, 2015), I maintain that aspect originates inside *v*P in Akan, the consequence of this is that there are multiple aspects in SVCs. In addition, I posit another aspect outside vP, which I refer to as *outer aspect*. Outer aspect is the position where aspect is interpreted. This move is necessarily since aspect cannot be interpreted within little vP. Aspect is a event quantifier, ⟨⟨⟩⟨⟩⟩, it cannot compose with the semantic type of its sister, VP.

The covert coordination accounts for the differences between Akan SVCs and SVCs in related languages. Another feature that distinguishes Akan SVCs from others is the absence of object sharing, i.e., two transitive verbs sharing the same object. In most languages, the shared object is sandwiched in between the two verbs, as in example (5).

	- a. ò 3sg dà pst sɛ́ roast la F nɛnè́ meat ɔ̀ɔ̀. eat 'He roasted meat and ate it.'
	- b. \* ò 3sg dà pst sɛ́ roast la F ɔ̀ɔ̀ eat nɛnè. ́ meat 'He roasted meat and ate it.'

Figure 1: Covert coordination in Akan

The sentence is ungrammatical if the object follows the last verb. In Akan, however, all transitive verbs get their own objects as illustrated in (6).

(6) Kofi Kofi kye-e catch-pst abɔfra child no det bo-o beat-pst no. 3sg 'Kofi caught the child and spanked him.'<sup>2</sup>

In this sentence, the object of *kyee* 'catch' and *boo* 'beat' are semantically the same, the child. However, each verb has its own syntactically realized object; the object of *catch* is 'the child,' while the syntactic object of 'beat' is the 3rd person singular pronoun.

(i) Kofi Kofi kye-e catch-pst nam meat we-e. chew-pst 'Kofi catch fish and ate it.'

<sup>2</sup> Inanimate pronouns are independently dropped in clause final position.

## **4 Distribution and analysis of tense and aspect**

In this section, I account for the difference in the distribution of tense and aspect in SVCs. I propose that different mechanisms value tense and aspect; tense by Agree and aspect by Selection. The analysis of tense is spelled out below, and the analysis of aspect in §4.2.

#### **4.1 Tense**

In §3, I propose a structure for SVCs that include only one tense projection. Recall that in Akan tense is marked on all verbs in an SVC, but the verbs necessarily have to match. This is why (7b) is ungrammatical.

	- b. \* Kofi Kofi tɔ-ɔ buy-pst aduane food re-di. prog-eat

I argue the tense restriction we see in SVCs is a result of this single T projection simultaneously Agreeing with all verbs. To account for the relation between the single tense projection and the multiple verbs in SVCs, I adopt Pesetsky & Torrego's (2007) theory of Agree, and an adaption of Hiraiwa's (2001) Multiple Agree. Since Multiple Agree is defined for goals that are in a c-command relation, I defined Parallel Agree to account for the SVCs context where there is no c-command between the goals.

Agree, according to Pesetsky & Torrego (2007), is feature sharing. They argue that agreement involves features of lexical items that differ as to whether they are valued/unvalued and interpretable/uninterpretable (Table 1).

Certain lexical items come from the lexicon unvalued and receive valuation from a valued instance of the same feature present on another lexical item. Interpretability, on the other hand, is concerned with whether or not a feature of a

Table 1: Pesetsky & Torrego (2007): valuation and intepretability features


particular lexical item makes semantic contributions to the interpretation of that item. For instance, the tense feature on verbs is valued but not interpretable, but tense feature on T is not valued but interpretable. In their theory, Agree leads to feature sharing, not feature deletion. The Agree relation is illustrated below with the sentence in (8).

(8) Kofi bought food.

Figure 2: Agreement between tense and the verb (before Agree)

Agree is thus a two-step process. For instance, the interpretable but unvalued tense feature on T,([iT val[]]), probes its c-command domain for the uninterpretable but valued tense feature on v, ([uT val[past]]). When it finds it, Agree is established, and T and V now shares the tense feature ([iT val[past]]) and ([uT val[past]]). If this Agree relation were not established, then the tense feature would not receive an interpretation.

The Agree relation is, however, only defined for T Agreeing with a single head. For the T to Agree with multiple verbs, I propose a modification of Hiraiwa (2001), which allows T to Agree with goals that are not in a c-commands relation. I refer to the modified Multiple Agree process as *Parallel Agree* defined below.

(9) Parallel Agree with a single probe is a single simultaneous syntactic operation: Agree applies to all matched goals at the same derivational point. There is no c-command between the goals.

Figure 3: Agreement between tense and the verb (after Agree)

Figure 4: Agree (, , ), where is a probe and and are matching goals for and they do not c-command each other.

With Parallel Agree, we can derive the Agree relation with the single T projection in SVCs with all the verbs in an SVC. This is illustrated below.

In this derivation, T probes and matches with all the verbs then simultaneously Agree with them. The unvalued but interpretable T feature ([iT []]) on T applies to all the matched goals derivationally simultaneously, establishing Agree (iT[pst]...uT *val*[pst]...uT *val*[pst]).

Now we can explain the matching restriction in SVCs. Since T can only have one interpretable feature, we predict that for a sentence to be grammatical, all the verbs in the sentence must have the same tense feature. For instance, if the first verb is valued for past tense, and the second verb is valued for the present tense, T receives two different tense values to interpret, therefore, uninterpretable.

#### **4.2 Aspect**

The central claim in this section is that *à*- is the spell-out of an inner aspect head with content-less neutral features [−prog, −fut]. As we have already seen, *à*- is only licensed in coordinate constructions, not in other multi-verb constructions. I argue that this is due to the asymmetric relationship between the external and internal complement in coordinate structures. The distribution of aspect is shown in (2) repeated here.


Cross-linguistically, aspect is thought to be below T, (see Rizzi (2004) Rizzi (2013), a.o, Cinque (2002, 2006), Rizzi & Cinque (2016) Cinque & Rizzi (2010) a.o). Following Kandybowicz (2010, 2015), I argue that aspect in Akan is merged lower, i.e., within the vP. However, as a quantifier of events (Hacquard 2006), aspect is only interpretable above vP; therefore, aspect is not interpreted where it is base-generated in Akan. To solve this apparent mismatch, I argue that there is another aspect projection above vP, where aspect is interpreted. I refer to this

Figure 5: Parallel Agree with two past tenses (before Agree)

Figure 6: Parallel Agree with two past tenses (after Agree)

Figure 7: Parallel Agree with past tense and present tense (before Agree)

Figure 8: Parallel Agree with past tense and present tense (after Agree)

projection as outer aspect. Inner aspect is the morphological projection of aspect.<sup>3</sup> Descriptively, Outer aspect and inner aspect are required to be compatible or match. Matching is checked through *Selection/Sel(ect)-Merge*. The proposal is that Outer aspect *selects* for an inner aspect with compatible features. Since Outer aspect and Inner aspect are not sisters, I assume that aspect features percolate to VoiceP, the sister of Outer aspect, where compatibility is checked. Webelhuth's (1992) percolation principles govern the feature percolation.

	- b. If the head of XP is not marked for F and spec of XP is marked for F, then F percolates from the specifier to the XP. (This could be done via the head, using spec-head agreement.)
	- c. If neither the head of XP or spec XP is specified for a value of F(+ or −) and complement is specified, then complement can percolate features to XP.

These principles determine where features can percolate from.

Inner aspect is associated with three morphosyntactic feature bundles; [+prog, −fut] (progressive) , [+fut, −prog] (future) and [−prog, −fut] (*à*). *à* has no semantic input. The interpretation of aspect is the union of outer aspect and inner aspect, but since Outer aspect has no value, Inner aspect determines the value of the construction. Through percolation, inner aspect features get to VoiceP where Outer aspect can check them for selection. I argue that since *à* has no semantic input, it is infelicitous in any position where it has to contribute to interpretation. This is what happens in the simple clause as in (3). In a simple clause, the feature bundle of *à'* percolate to VocieP where it is accessible to Outer aspect, but there is no interpretable aspect features. We can extend the same argument to all contexts where consecutive *à*- is not felicitous. For instance, as the verb in an SVC, the features of *à*- has to percolate to Outer aspect for interpretation. The analysis here borrows the idea from Zhang (2009) that the external conjuncts are privileged, i.e., it is only the features of the external conjunct that are relevant for feature selection.

For instance, a verb that does not select for a CP complement can take a conjoined DP and CP as its complement as long as they are in that order. In (11),

<sup>3</sup>The idea of two aspect projections in the syntax is a not new, inner aspect is used to refer to lexical aspect/situational aspect while outer aspect to grammatical/viewpoint aspect (see Travis 2010, 1991, Smith 1991, MacDonald 2006). In Tagalog and Navajo, Inner aspect also encodes grammatical aspect (Travis 2010).

though the preposition *on* selects a DP, it can take a conjoined DP and CP, only if the DP is in the external conjunct.

(11) a. You can depend on *my assistant* and *that he will be on time*. (DP & CP) b. You can depend on *that he will be on time*.

In the same way, *à*- is only licensed in the internal conjunct of a coordinate clause. The *à*-[+prog, −fut] in an internal conjunct position does not percolate to VoiceP and is not selected by Outer aspect for interpretation. The only features that are accessible to the outer aspect are the Inner aspect features in the external conjunct. The tree in Figure 9 is a representation of example (2a) above.<sup>4</sup>

This same logic can be used to account for the ungrammaticality of *à* in both the \*V(à) V(à) and \*V(à) V(Asp) sequence. But nothing I have said so far rules out having two overt aspects with interpretation, i.e.,\*V(Asp) V(Asp), though empirically it is ungrammatical. I derived the ungrammaticality of these sequence from an economy of interpretation principle modeled after Bošković (1997) minimal structure principle.

(12) Economy of interpretation

Provided that lexical requirements of relevant elements are satisfied, if two representations have the same gross syntactic structure and have the same interpretation, then the version that has fewer *+features* is chosen as the morphosyntactic representation serving that function.

	- b. \* re (+prog, −fut) + bɛ (−prog, +fut)
	- c. re (+prog, −fut) + *à* (−prog, −fut)

Based on the privileged position of external conjuncts of coordinate structures and the fact that the feature percolation principle requires that features percolate from the specifier, all the cases in (13) end up with the same aspect interpretation. As such, semantically they serve the same function. Therefore, the simplest version,i.e., (13c), with the least number of *+features* is chosen.

## **5 Conclusion**

I have argued that tense and aspect have different distributions in these clauses because they are governed by distinct mechanisms, tense by Agree and aspect

<sup>4</sup>A reviewer notes that the feature percolation proposed leads us to expect arbitrarily longdistance morphological dependencies. This is indeed a problem for feature percolation. For now I assume that features can percolate until they met an operator that check those features.

Figure 9: Percolation of aspect features

by Selection. The -*à* morpheme that appears on the non-initial verbs in these clauses is the phonological realizations of the morphosyntactic feature bundles [−prog,−fut] of inner aspect. It is only licensed in a position where its aspectual features do not percolate to a VoiceP. Tense morphology depicts T-v Agree. There are multiple instances of tense because a single T head parallel-Agrees with all the verbs. Single T-head Agree results in the matching restriction. This difference in the mechanism used for the valuation of features is not arbitrary; it is governed by the position of the syntactic objects involved. There is one T merged in these constructions but multiple verbs with tense features that need to be checked, the single T thus enter into an Agree relation with all of them. Aspect, on the other hand, is merged multiple times

## **Abbreviations**


## **Acknowledgments**

The chair of my second qualifying paper, Professor Mark Baker and my committee members, Ken Safir and Viviane Deprez help shaped most of the ideas in this paper. I would like to thank the Rutgers syntax reading group (STAR), your comments and insights helped improve this paper. I am also grateful to the participants of ACAL 49, your questions and comments were beneficial. I have benefited from the comments and insights of two anonymous reviewers.

## **References**


#### Augustina Owusu


# **Chapter 16**

# **Counting mass nouns in Guébie**

## Hannah Sande<sup>a</sup> & Virginia Dawson<sup>b</sup>

<sup>a</sup>Georgetown University <sup>b</sup>University of California, Berkeley

This paper contributes to the growing body of work on countability properties of nouns across languages by investigating the three-way countability distinction in Guébie, an Eastern Kru language spoken in Southwest Côte d'Ivoire. Guébie distinguishes three core categories of noun, which we call *true mass, count*, and *countable mass nouns*, and possesses a singulative suffix which converts countable mass nouns into count nouns. We use a mereological model to capture this threeway distinction, and the effects of the singulative suffix.

## **1 Introduction**

This paper investigates the countability properties of nouns in Guébie, an Eastern Kru language spoken in Southwest Côte d'Ivoire. Guébie distinguishes three core categories of noun, based on number marking. We adopt a mereological model based on properties of cumulativity and divisibility to account for the behavior of these nouns. Additionally, we situate Guébie's system in the emerging typology of countability distinctions cross-linguistically.

Guébie is an endangered Kru language spoken by no more than 7,000 speakers in Côte d'Ivoire. There is one known monolingual speaker, while other speakers are bilingual in Guébie and French, and often other neighboring Kru languages. The data presented here was collected over the past five years in Sande's work with the Guébie community (Sande 2017). The specific forms in this paper have each been confirmed by at least two male speakers, ages ~30 and ~40.

In §2 we present the morphological number marking and syntactic distribution facts for the three categories of nouns in Guébie. §3 lays out a semantic analysis of the three degrees of countability in Guébie, based in a mereological approach.

§4 briefly situates Guébie within the growing typology of number marking, and §5 concludes.

### **2 Guébie number marking**

In this section we show that Guébie distinguishes three noun categories based on number marking:


The diagnostics for these three categories are based primarily on their compatibility with Guébie's number morphology: the plural marker (/-a/ or /-i/) and the singulative marker (/-je/ or /- bə/). The two plural markers and two singulative markers are allomorphs and do not differ in meaning (Sande 2017).<sup>1</sup>

#### **2.1 Count nouns**

Count nouns in Guébie have a singular individual interpretation in their bare form. These include words for humans, large animals, and items that typically do not come in groups, i.e. [ŋʷɔnɔ4.4] 'woman', [ bə31] 'plate', [mεɔ3.1] 'tongue'.<sup>2</sup> Bare count nouns cannot have a plural or substance interpretation. This is shown in (1), where the bare form of [ bə31] 'plate' cannot be predicated on a plural subject.

(1) \* liene3.3.1 dem.pro.prox εja2.3 with lieko3.3.1 dem.pro.dist bə<sup>31</sup> plate mɔ<sup>1</sup> be.emph Intended: 'This thing and that thing are plate(s).'

<sup>1</sup>The two singular markers do not seem to differ in meaning, and there are phonological traits which explain their distribution. However, one speaker expresses an intuition that nouns that take /-je/ are often small, while nouns that take /- bə/ are often large and/or round. However, this intuition does not hold up across the collected data. More work will be done in the future to explore this area. If a difference in size is found to be conveyed, a classifier-like analysis of the singular markers might be more appropriate than the one presented here; though see §4 on how classifiers are semantically similar to the singular marker in Guébie.

<sup>2</sup>Guébie has four distinct tone heights, marked with numbers 1–4, where 4 is high.

These nouns combine directly with the plural suffix (/-a/ or /-i/) to yield a plural reading. Example (2), in contrast to (1), shows that morphologically pluralmarked count nouns are predicated of plural subjects.

(2) liene3.3.1 dem.pro.prox εja2.3 with lieko3.3.1 dem.pro.dist bə-i3.12 plate-pl mɔ<sup>1</sup> be.emph 'This thing and that thing are plates.'

Table 1 shows a selection of count nouns in their bare form and with the pl suffix.<sup>3</sup>


Table 1: Count nouns in Guébie

Count nouns cannot combine with the singular suffix, as shown in (3).

	- a. \* mεɔ3.1 - bə/je<sup>1</sup> tongue-sg Intended: 'a tongue' b. \* bə<sup>31</sup> - bə/je<sup>1</sup>
	- plate-sg Intended: 'a plate'

<sup>3</sup>Both plural suffixes in Guébie are associated with a level tone 2. When attached to a root, if the root is associated with more underlying tone heights than syllables (e.g. two tone levels on a monosyllabic word, as in example a in Table 1), then we see one-to-one association of syllables to tone heights beginning at the left, and any leftover tone heights form a contour together with the plural level 2 at the right edge.

Only the plural form of a count noun can combine with a numeral greater than one, as shown in (4).

	- a. mεɔ-ɪ3.1.2 tongue-pl ta3 three 'three tongues'
	- b. \* mεɔ3.1 tongue ta3 three Intended: 'three tongues'
	- c. bə-i3.12 plate-pl ta3 three 'three plates'
	- d. \* bə<sup>31</sup> plate ta3 three Intended: 'three plates'

Similarly, only the plural form of a count noun can combine with an 'all' or 'many' quantifier, as shown in (4). The translations marked with "#" are impossible interpretations of these utterances.

	- a. bə-i3.12 plate-pl a ba4.2 all 'all the plates', #'all the plate' b. bə-i3.12 butugba3.1.1
	- plate-pl much 'many plates', #'much plate'
	- c. \* bə<sup>31</sup> plate a ba4.2/ butugba3.1.1 all/much Intended: 'all/much plate' or 'all/many plates'

In sum, count nouns in Guébie act much like count nouns in English. They have a singular interpretation in their bare form and a plural interpretation when combined with plural morphology. In the latter case, they can appear with a numeral greater than one, or with quantifiers 'all' and 'many'.

#### **2.2 True mass nouns**

The second class of nouns in countability terms in Guébie are the true mass nouns. These nouns refer to substances, including liquids like 'blood, oil,' and those consisting of very tiny particles like 'sand' and 'salt'.

True mass nouns can only surface in their bare form. Unlike count nouns, mass nouns cannot combine directly with the plural suffix. Additionally, mass nouns cannot combine with the singulative suffix, as shown in Table 2.


Table 2: True mass nouns in Guébie

True mass nouns can never combine with numerals in Guébie, as shown in (6).

	- a. dodo3.2 sand la2 of ci-ə2.2 type-pl ta3 three 'three types of sand'
	- b. \* dodo3.2 sand ta3 three Intended: 'three sands'

Unlike count nouns, which cannot combine with quantifiers 'all, many' in their bare form (5), bare mass nouns combine with quantifiers (7).

(7) Quantifiers can modify bare mass nouns


c. dodo3.2 sand a ba4.2 all 'all the sand'

In sum, true mass nouns never appear with number-marking morphology, and they cannot be modified by numerals. Unlike count nouns, they can be modified by quantifiers in their bare form.

#### **2.3 "Countable" mass nouns**

The third class of nouns, which we call "countable" mass nouns, shows split behavior: bare countable mass nouns pattern with mass nouns, while sg-marked countable mass nouns pattern with count nouns.

The countable mass class makes up a large part of the Guébie lexicon, consisting of individuals that typically come in groups. These include insects, small animals, body parts, fruits and vegetables, grains and nuts, stars, ashes, etc.<sup>4</sup>

Like mass nouns, bare countable mass nouns cannot combine directly with the plural suffix, as shown in Table 3.


Table 3: Countable mass nouns in Guébie

Again like mass nouns, and unlike count nouns, bare countable mass nouns cannot combine with numerals, but can combine with quantifiers. This is shown in (8).

<sup>4</sup> Interestingly, 'water' also falls into this class: when it combines with the sg suffix, it refers to a body of water such as a lake. For the present, we set 'water' aside, as we are unsure to what extent coercion plays a role.

(8) a. \* ɟa<sup>31</sup> coconuts ta3 three Intended: 'three coconuts' b. ɟa<sup>31</sup> coconuts a ba4.2 all

'all coconuts'<sup>5</sup>

Unlike both other classes of nouns, countable mass nouns can combine with the sg suffix to yield a singular individual reading. Just like bare count nouns, these sg-marked nouns cannot be predicated of plural subjects, as shown in (9).

(9) \* liəne3.3.1 dem.pro.prox εja2.3 with liəko3.3.1 dem.pro.dist ɟa- bə3.1 coconuts-sg mɔ<sup>1</sup> be.emph Intended: 'This thing and that thing are coconuts.'

However this sg form can then be pluralized with the /-a, -i/ plural marker, in which case it can surface as the predicate of a plural subject,<sup>6</sup> as in (10).

(10) liəne3.3.1 dem.pro.prox εja2.3 with liəko3.3.1 dem.pro.dist ɟa- bə-i3.1.2 coconuts-sg-pl mɔ<sup>1</sup> be.emph 'This thing and that thing are coconuts.'

Table 4 shows these number marking patterns for a selection of countable mass nouns.

Like plural count nouns, pl-marked countable mass nouns (noun-sg-pl) can combine with numerals greater than one and quantifiers, but a noun-sg form cannot. This is shown in (11) and (12).

(11) -sg-pl mass nouns with numerals


<sup>5</sup>More data is needed to know whether this has a definite interpretation similar to using a universal quantifier with a mass noun in English, and whether (8b) is interpreted differently than (12a).

<sup>6</sup> See Marchese (1979: 88–89) for a 2-way split in other Kru languages between countable nouns that take a plural suffix directly and countable mass nouns which take sg-pl suffixes.


Table 4: Singular and Plural on countable mass nouns

$$\text{(12) }\quad \text{-sg-PL mass hours with quantifiers }\dots$$

a. ɟa- bə-i3.1.2 coconuts-sg-pl a ba4.2 all 'all coconuts'

b. \* ɟa- bə3.1 coconuts-sg a ba4.2 all Intended: 'all coconuts'

To summarize, bare countable mass nouns pattern with true mass nouns in that they cannot take plural marking or be modified by a numeral. By contrast, the sg-marked form of a countable mass noun patterns with count nouns. The sgmarked form yields a singular individual interpretation, it can take plural marking, and it can be modified by a numeral (by the numeral one in the noun-sg form, and by any numeral greater than one in the noun-sg-pl form). These properties are summarized in Table 5.

Table 5: Properties of noun types in Guébie


#### **2.4 Summary**

Based on the distribution of singular and plural suffixes as well as numerals, we have seen that there is at least a three-way distinction in countability across nouns in Guébie: count nouns (e.g. 'plate,' 'woman'), countable mass nouns (e.g. 'coconuts,' 'finger'), and true mass nouns (e.g. 'blood', 'sand').

## **3 Semantics**

An analysis of the above data must account for (i) the different distribution and behavior of count nouns, true mass nouns, and countable mass nouns, and (ii) the distribution of sg and its semantic effect (i.e. that it takes a countable mass noun and turns it into a count noun). We assume here that the pl marker in Guébie is analogous to pl marking in languages like English.

#### **3.1 Count nouns vs. true mass nouns**

A concrete way to model countability distinctions relies on notions of cumulativity and divisibility.<sup>7</sup> These properties are defined in (13) and (14) respectively.


Noun denotations that are neither cumulative nor divisive have been termed "quantized" (Krifka 1989, Deal 2017), while those that are both cumulative and divisive have been termed "homogeneous" (Bunt 1985, Deal 2017). These properties distinguish English singular count nouns and mass nouns respectively.

For example, consider the count noun *plate*. If some thing A can be truly described as a plate, and B can also be truly described as a plate, it does not follow that A+B are a plate. Instead, A+B are truly described as plates. This shows that the English noun *plate* is not cumulative. Likewise, if A can be truly described as a plate, it does not follow that some subpart of A is also a plate. Instead, it would be described as part of a plate. This shows that English *plate* is not divisive.

<sup>7</sup> See Quine (1960), Cheng (1973), Link (1983), Krifka (1989), Doetjes (1997), Grimm (2012b), and Deal (2017), among others.

In contrast, consider the mass noun *sand*. If there is some thing A that can be truly described as sand, and B can also be truly described as sand, it follows that A+B are sand. Unlike *plate*, the English noun *sand* is cumulative. Likewise, if A can be described as sand, it follows that some subpart of A is also sand. The English noun *sand* is also divisive.

This is summarized in (15) and (16).

	- a. A is a plate, and B is a plate, but A+B are not a plate
	- b. A is a plate, but any subpart of A is not a plate
	- a. A is sand, and B is sand, and A+B is sand
	- b. A is sand, and any subpart of A is sand

We can schematize these properties of count and mass nouns as in (17). The denotation of a quantized noun like *plate* contains only non-overlapping individuals: while individual plates a, b, and c are in the denotation of *plate*, their sums and subparts are not. In contrast, the denotation of a cumulative noun like *sand* only contains members that overlap with other members: each member of the denotation of *sand* is a subpart of another member, and shares each of its subparts with another member.

(17) a. JplateK = {a, b, c} b. JsandK = {ab, bc, ac, abc}

This analysis of the English count/mass distinction extends nicely to Guébie's count nouns and true mass nouns. Just like in English, Guébie's count nouns are quantized (i.e. neither divisive nor cumulative), and its true mass nouns are homogeneous (i.e. both divisive and cumulative). This is schematized in (18).

$$\begin{aligned} \text{(18)} \quad \text{a. } \left[ \text{"b} \text{ə}^{31} \text{ "plate'} \right] &= \{ \text{a, b, c} \} \\ \text{b. } \left[ \text{dolo}^{3.2} \text{ "sand'} \right] &= \{ \text{ab, bc, ac, abc} \} \end{aligned} $$

This analysis allows us to account for the distributional differences of pl between count nouns and true mass nouns: just like in English, pl can only combine with quantized denotations.<sup>8</sup> It also allows us to capture the restriction on numeral modification: numerals can only modify quantized denotations.<sup>9</sup>

#### **3.2 Countable mass nouns and sg**

Bare countable mass nouns behave like mass nouns, but when they are marked with the sg suffix, they behave like count nouns. Modeling noun meanings in terms of cumulativity and divisiveness allows us capture this. Just like true mass nouns, countable mass noun denotations are cumulative. For example, arbitrarily large groups of coconuts and ants can be referred to with a bare countable mass noun. However, like count nouns and unlike true mass nouns, countable mass noun denotations are not divisive: they contain non-overlapping minimal parts. These properties can be captured by assuming that the denotations of countable mass nouns in Guébie contain both non-overlapping individual members and sums of those individual members. A countable mass noun denotation is schematized in (19), where individual letters *a, b* and *c* represent atomic individuals, such as individual coconuts or ants, and combinations of those letters represent sums of those individuals, such as a sum of two or three individual coconuts or ants.

(19) Jɟa<sup>31</sup> 'coconuts'K = {a, b, c, ab, bc, ac, abc}

Since these denotations are cumulative, they cannot combine with pl or be directly modified by numerals, just like true mass nouns. They are crucially different from mass nouns, however, in that their denotations do contain nonoverlapping minimal parts. This kind of cumulative but non-divisive noun denotation is also found in English (for "fake mass" nouns like *furniture* and *jewelry*) and in classifier languages like Chinese and Japanese (see Doetjes 1997, Landman 2011, Deal 2017). A piece of furniture plus another piece of furniture is still called *furniture* in English (cumulativity), but a sub-part of a piece of furniture such as the leg of a chair is not *furniture* (non-divisive). Just like in Guébie, *furniture* cannot be marked pl \**furnitures* or be directly modified by numerals \**three furniture(s)*. We return to the cross-linguistic picture in the following section.

<sup>8</sup>The role of pl is to add sums to the denotation, and thus makes the resulting denotation cumulative. There is debate in the literature about the exact nature of pl (e.g. whether the resulting denotation includes atoms as well as sums; see Sauerland et al. 2005, Farkas & de Swart 2010), that we do not wish to address here. The Guébie pl data are compatible with analyses that account for English pl.

<sup>9</sup>This assumes that only sets with non-overlapping members (i.e. quantized denotations) can be counted (Chierchia 1998, Landman 2011). For languages that have pl inflection on nouns that are modified by numerals >1, that pl marking is taken to be either purely morphosyntactic (Krifka 1989) or semantically undone by the numeral modification (Chierchia 1998).

Finally, we propose that this difference is what allows countable mass nouns (but not true mass nouns) to combine with the sg suffix. Specifically, the role of the sg suffix is to take in a countable mass noun denotation like in (19), and remove all non-atomic members. The result is the quantized denotation in (20), which, like the denotation of a count noun, only contains non-overlapping individuals (i.e. individual coconuts or ants).

(20) Jɟa- bə3.1 'coconuts'K = {a, b, c}

Since a sg-marked countable mass noun is now quantized, it can combine with pl marking, just like the quantized bare count nouns. Importantly, sg cannot attach to true mass nouns because their denotations do not contain these nonoverlapping minimal parts.

The analysis presented here also allows us to capture the distribution of the quantifiers [a ba4.2] 'all' and [ butugba3.1.1] 'many'. We propose that these quantifiers can only combine with cumulative noun denotations. This allows these quantifiers to combine with the homogeneous denotations of true mass nouns, and with the cumulative but non-divisive denotations of bare countable mass nouns, pl-marked count nouns, and sg-pl-marked countable mass nouns. In contrast, these quantifiers cannot combine with the quantized denotations of bare count nouns and sg-marked countable mass nouns.

### **4 The cross-linguistic picture**

We have seen that Guébie has a core three-way countability distinction in its nominal semantics, and that this three-way distinction can be captured in terms of cumulativity and divisiveness. Similar three way distinctions are also found in other languages. For example, in addition to the binary mass/count distinction, English also distinguishes a smaller class of "fake mass" nouns like *jewelry, furniture,* and *footwear*. Welsh (Grimm 2012a) has a larger class of nouns that are interpreted plural in their bare form, and require a sg suffix for singular reference. This contrasts with nouns that are interpreted singular in their bare form (count nouns), and those that cannot take the sg suffix (mass nouns).

Other languages appear to only make a two way distinction. For example, classifier languages, like Chinese and Japanese, make a countability distinction in terms of divisiveness, but not cumulativity.<sup>10</sup> These languages lack quantized

<sup>10</sup>For evidence of countability distinctions in Chinese and Japanese, see Cheng & Sybesma (1998), Inagaki & Barner (2009), and Cheung et al. (2010). For an explicit proposal in terms of cumulativity and divisiveness, see Deal (2017).

noun denotations; typical count nouns like 'plate' are cumulative in these languages, as indicated in (21). Note that this kind of analysis lends itself to an explanation of the typical absence of pl marking in such languages, and that all nouns in such languages require classifiers in numeral modification.

	- a. Individual-denoting nouns (e.g. 'plate'): {a, b, c, ab, bc, ac, abc}
	- b. Substance-denoting nouns (e.g. 'sand'): {ab, bc, ac, abc}

While cumulative but non-divisive noun denotations are commonly attested cross-linguistically, languages differ in how they treat such denotations. In the first place, languages differ in what objects are assigned cumulative, non-divisive denotations. This class is small in English (*furniture, jewelry, footwear* and *mail*, among some others), with most nouns either truly mass or count. Languages like Guébie and Welsh, in contrast, have very large classes of such nouns, consisting of a wide variety of objects that typically come in groups. Classifier languages like Chinese and Japanese assign all non-substance nouns such denotations.

Second, languages differ in how they allow such nouns to be modified by a numeral. English uses measure words (e.g. *three pieces of furniture*), while Chinese and Japanese have dedicated classifiers. In contrast, Guébie and Welsh have sg suffixes that convert a cumulative, non-divisive noun into a quantized noun.

Finally, while both Guébie and Welsh employ similar strategies for allowing such nouns to be modified by numerals (via a sg suffix), they also show an interesting difference: sg-marked nouns in Guébie can be further pluralized, but are not in Welsh.

### **5 Conclusion**

Guébie shows a core, three-way countability distinction in its nominal semantics, based on number morphology and numeral modification. A singular suffix takes countable mass nouns and turns them into count nouns. We model these distinctions in terms of cumulativity and divisiveness, which are useful concepts for modeling countability across languages.

## **Acknowledgments**

We are grateful to the Guébie community for sharing their time and their language. Thanks also to the audience at ACAL 49 in Michigan for their feedback.

### **Abbreviations**


### **Appendix A List of countable mass nouns in Guébie**


### **References**


# **Chapter 17**

# **The future of the indigenous languages of Kenya and Tanzania**

Angelina Nduku Kioko<sup>a</sup> & Josephat Rugemalira<sup>b</sup>

<sup>a</sup>US International University-Africa <sup>b</sup>Tumaini University College Dar es Salaam

This paper examines the language policies and practices in Kenya and Tanzania and argues that, in spite of the observable differences between these neighbouring countries, the ethnic community languages face an uncertain future. Although language policies play a role in determining this future, there are stronger forces that defy language policy, viz. population movements, urbanization, technological changes affecting mass communication, and the structure of the economies.

## **1 Introduction**

This paper examines the language policies and practices in Kenya and Tanzania in order to determine their impact on the past and future fortunes of the indigenous languages. The agenda is two-fold. First, it is of interest to determine which policies (if any) are more significant in the endeavour to enhance the fortunes of less powerful languages, i.e. which policies or mix of policies are more effective in the business of language promotion and preservation. Second, we consider the open possibility that the language policies notwithstanding, the powerful forces for change in the social and political systems of the world are set against the long term preservation of linguistic diversity. We show that in spite of the observable differences in language policies and practices between Kenya and Tanzania, the threats to the indigenous languages in both countries are the much more formidable forces beyond the tinkering of politicians and bureaucrats of small states. These forces include population movements, urbanization, technological changes affecting mass communication, and the structure of the economies.

Angelina Nduku Kioko & Josephat Rugemalira. 2022. The future of the indigenous languages of Kenya and Tanzania. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 299–316. Berlin: Language Science Press. DOI: 10.5281/zenodo.6393768

## **2 Hostile language policies and practices in Tanzania**

In this section we show that overall there are more policies and practices hostile to the indigenous languages in Tanzania than in Kenya. After presenting comparative language demographics for the two countries, we proceed to examine the policies and practices that appear to place a stranglehold on the local languages of Tanzania while Kenyan languages appear to get a friendlier environment.

#### **2.1 Demographic basics**

Table 1 shows that only 13.3 million people, i.e. 38% of the population of Tanzania (which was 35m in 2009) speak a language among the top ten. By contrast, 32.8 million Kenyans, i.e. 85% of the Kenyan population (which was 38.6m in 2009) speak one of the top ten languages. This means that it is more difficult for a Tanzanian ethnic community language to break out of the large pack of small community languages – 150 – in total and gain respect and power. In Kenya the top ten languages constitute a quarter of the total number of languages spoken in the country – about forty altogether. Such relatively respectable numerical strength provides a base for public recognition of the languages.


Table 1: Top ten languages in Kenya and Tanzania (millions of speakers). Sources: Kenya National Bureau of Statistics 2009, Mradi wa Lugha za Tanzania 2009.

#### **2.2 Languages of government**

There is no formal recognition of indigenous languages in Tanzania. Indeed such recognition is regarded as inappropriate and undesirable for the unity of the nation. The first attempt to mention language matters in the aborted 2014 draft constitution only mentions Kiswahili as national and official language and English as the other official language. By contrast, the 2010 Kenya constitution guarantees space for indigenous languages. The primary language of government business in Tanzania is Kiswahili; in oral communication English is rather minimal. Even in formal government documents Kiswahili is edging out English. It is indeed arguable that English may still hold on in written communication because some of the government documents need to be presented to foreign governments for consultations on aid.

Parliament is a good illustration of the struggle for turf between Kiswahili and English and a good indicator of the slow pace of change within the legal profession. While all written laws (except the constitution which is in Swahili and English) are drafted in English, they are orally presented in summary form in Kiswahili and debated in Kiswahili; they are enacted in the English version. Yet MPs need not demonstrate competence in English, although representatives in regional assemblies, such as the East African Community & SADC, defend their candidacy in English within parliament and may lose votes if they cannot express themselves properly. In the courts the language of oral communication is Kiswahili; oral proceedings and the verdict are in Kiswahili, but English is used in written submissions and final judgement.

All this is in stark contrast to the Kenyan scene where the primary language of government business is English and, though Kiswahili is an official language and the national language, its use in government is usually prompted by issues of access, rather than issues of policy preference. It is not normal for a Kenya government document to be written in Kiswahili. In parliament, the constitution allows a choice of either Kiswahili or English; in practice English is the language of parliamentary proceedings. Elected politicians in parliament and county assemblies must demonstrate competence in English and Kiswahili. All written laws including the constitution are in English; bills are drafted and debated in English. In the courts English reigns supreme and Kiswahili may be available via an interpreter; in lower courts an indigenous language may be allowed via an interpreter.

The dominance of Kiswahili in Tanzania government business may have become a major obstacle to the promotion of indigenous languages because of the assumed near total accessibility of the national language. By contrast, the dominance of English in Kenya government business may be seen as a factor helping promote the indigenous languages because of the need to ensure access to government communication.

#### **2.3 Language in education**

In Tanzania Kiswahili is the language of instruction (LOI) in the seven years of primary school (except for a very small section of the population who can access English medium primary schools). English is the LOI at secondary and tertiary levels;<sup>1</sup> it is a compulsory subject from the third year of school. Indigenous languages are banned in school.

Kenya is more accommodating of the indigenous languages, allowing them to be used as languages of instruction in the first three years, alongside with Kiswahili. English takes over as LOI from fourth grade. This friendly policy arrangement is largely thwarted in practice because English has become the preferred language of school from nursery level.

Language in other public domains: In Tanzania's mass media, Kiswahili predominates over English. English is sporadic in print media and insignificant in radio and TV. Indigenous languages are banned in the media (Tanzania Communications Regulatory Authority 2005).<sup>2</sup> Kenya, on the other hand, has a vibrant body of broadcasting in indigenous languages even though English is dominant overall in both print and electronic media.

Kiswahili is dominant in the various forms of public entertainment (music, soap opera, stand-up comedy) in both countries, which is a good reflection of the sensitivity to market forces. English may come in second ahead of indigenous languages in this regard. The same pattern of paying attention to the structure of the market may be observed in electoral politics (campaigns for office). In both countries Kiswahili is the language for getting votes: Kenya's political campaigns are mainly in Kiswahili in urban and cosmopolitan areas; in Kiswahili or indigenous languages in the rural areas (except presidential campaigns). English is only used on TV talk shows. In Tanzania Kiswahili is the language of politics, whereas English, while permitted, is virtually unusable. The indigenous languages however are banned, being covered by the "divisive language" label of campaign regulations (Tanzania Government 2010).

<sup>1</sup>This policy has been reaffirmed over the past several decades in spite of the perennial debate about its appropriateness and effectiveness, cf. Tanzania Government (1995, 2014), Qorro et al. (2012).

<sup>2</sup>The government has not registered any indigenous language newspaper since independence even though the relevant laws have not explicitly banned such outlets (Rugemalira 2013: 68– 69). Papers registered before independence eventually died for various reasons – including an environment hostile to indigenous languages (Madumulla 2007: 92).

Language choice in product packaging, user manuals, and advertising can be a good indicator of market structure, but there is considerable sophistication here. English appears to be dominant in both countries, which may be a sign of external dependencies of the economies (in imports and exports, and historical links of the particular industry). Bilingual labeling in English and Kiswahili is sporadic – apparently without being required by government regulations.<sup>3</sup> A study to determine the level of sophistication in audience targeting might clarify the advertizing scene somewhat: for instance, how do the cell-phone companies choose the language for their myriad promotional drives?<sup>4</sup> In this domain, in both countries, the indigenous languages are at the losing end even if more so in Tanzania.

In the sphere of organized religion, Kiswahili is in dominant use in Tanzania – even in rural congregations. Translations of the bible, catechism/religious instruction, and hymns in the indigenous languages have fallen into disuse. Popular gospel music is in Kiswahili – another nod to market forces. While there are no government regulations on this, it is notable that the churches and mosques have promoted Kiswahili services. In Kenya, English and Kiswahili religious services are common in urban and cosmopolitan areas. In the rural areas indigenous languages are more prevalent while some services in Kiswahili or English may be found. Hymnal and bible translations into indigenous languages are still in popular usage even in urban areas.

#### **2.4 Language attitudes**

The stigma attached to indigenous languages as "dangerously tribalistic and retrogressive" was most explicitly articulated by Nyerere (1995) at his party's conference to nominate a presidential candidate for the first multi-party elections in Tanzania. Although the point he was making about the imperative of nominating a candidate on grounds of leadership merit rather than narrow ethnic belonging is crucial, the damage to the cause of indigenous languages and cultures was colossal: in that speech he maintained that the only use left for "tribal languages" was in tribal religious ritual ("*kutambika*") and that in this age only fools would engage in tribal-based associations. Kiswahili on the other hand is the language of national unity, progress and political correctness. As for English, the ambivalence in attitudes is oppressive: it is the language of the erstwhile colonial master

<sup>3</sup>How often have you come across a Swahili label on a bottle of water or soda, a bag of cement, or a packet of medicines?

<sup>4</sup> It is kind of fashionable to choose an English trade name: "the linguistic market is strongly influenced by English in ads, business,trade and commerce, banks, technology, TV and in other domains that are a concomitant of Western lifestyle." (Legère 2010: 64)

and the imperialist powers; but it is also the language of individual power and prestige, and anyone who amasses sufficient linguistic capital in this form is sure to go places and climb higher on the social ladder. This would account for the "show-off" phenomenon whereby some speakers engage in uncontrolled codemixing involving English and Kiswahili, in very formal contexts regardless of the needs of the audience.<sup>5</sup> Still conditions for the mastery of English in the education system are tough and in communication encounters the advice for many would be avoid it as best you can lest you face humiliation.

In Kenya, there is a relatively strong and positive attachment to indigenous languages; but you speak English to colleagues and the boss at the office, Kiswahili to the house servant, and the indigenous language to your tribesman to signal belonging.

### **3 Emerging patterns of promise**

There are significant differences between Kenya and Tanzania regarding the way the ethnic community languages (ECLs) are treated. In Kenya the ECLs appear to have far more habitable space, both in public and private domains than is the case in Tanzania where official policies and practices as well as private preferences appear to disregard or actively suppress these languages. The notable legal constitutional and language in education policy openings that provide relief to Kenya's ECLs have created three potential growth poles for the languages.

The first more vibrant pole revolves around the use of the local languages in the media. As noted in the previous section, ECL print media in Kenya has been quickly overshadowed by the fast growth of digital channels – TV and FM radio. It would be safe to say that most ECLs in Kenya can boast of at least a radio FM station. In any case the government-owned Kenya Broadcasting Corporation maintains scheduled radio programmes in many of the ECLs in Kenya. Many of the larger communities have access to several TV and radio stations. According to the records from the Communication Authority of Kenya [CAK], there are currently 64 registered "vernacular" radio FM stations and 20 registered "vernacular" TV stations (Personal Communication with CAK officials). The growth, especially of the TV stations, is quite recent but very rapid. This is in spite of

<sup>5</sup> "As for language choices, a laissez-faire approach is observed in the recent past. Swahili still holds a strong position, but there is a tendency in public that some people tend to demonstrate their (mostly rudimentary) knowledge of English by inserting English words (articulated with a terrible Swahili accent) whenever possible. They produce a speech variety that is basically Swahili, but intended to demonstrate the status of the speaker who is eager to distinguish him/herself from others who do not dispose of an English vocabulary." (Legère 2010: 54)

#### 17 The future of the indigenous languages of Kenya and Tanzania

the initial persecution of promoters of vernacular radio stations in the 1990s and early 2000, and the perception of some politicians that the vernacular radio stations are a threat to national unity. It is important to note that there has not been any legislation prohibiting the use of ECLs in media; on the contrary, the Kenyan constitution guarantees the citizens of linguistic rights:


Chapter 4 has the following provisions:


The competitive atmosphere among ethnic groups promotes these outlets and the languages they serve. In the rural areas, in particular, these stations are the primary sources of information and entertainment. In addition to the usual entertainment and popular adverts, these stations run programs with diverse content including civic education, instructions on relevant farming methods, information on health related issues, and religious instructions.

These media outfits/firms create jobs and encourage entrepreneurs to thrive.<sup>6</sup> They are seen as part of the creative economies. They make literacy in these languages meaningful and may become a catalyst that creates demand for ECL courses in school beyond third grade. The young speakers and programmers in these stations are quite good in their use of the ECLs even though it is not clear where they polished these skills, since ECLs are not taught in the schools beyond primary three.

The second growth pole for the ECLs in Kenya is rooted in the language education policy. The policy states that during the first three years children should be taught in the language of the catchment area. In the homogeneous rural areas, where the majority of Kenyans currently live, the language of the catchment area is an ECL. In addition to use as languages of instruction in these early years, the ECLs are taught as subjects in the first three years of school. It is expected that learners in the rural areas will develop basic literacy skills within that time. The Kenya Institute of Curriculum Development has a department that deals with "mother tongue". With the liberalization of the production of learning materials, this department is expected to evaluate materials for teaching the various mother tongues. The department works with language committees from the various language groups to ensure the development and use of standard orthographies.

The third growth pole pertains to the religious sphere. In this respect, comparisons between Kenya and Tanzania ought to be much closer because of the absence of government regulation on the Tanzanian side, and because in both countries the relevant organized religion is Christianity since Islam does not appear to have transcended its Arabic and Kiswahili avenues.

The production of new or revised versions of the bible in various ECLs is considerably vibrant, often with active community involvement. The significance of these initiatives in setting standards for orthography and formality, as well as language boundaries, has long been recognized even if inter-denominational and inter-ethnic rivalry may have created unnecessary confusion and duplication in some instances (Kioko 2017: 242). For many ECLs in both Kenya and especially Tanzania, religious materials [bible, hymn book, prayer book, religious instruction book/catechism], are available mainly in print but also in digital form on smartphones or "audio proclaimers", and religious gatherings provide the main public domain of use.

<sup>6</sup>What may be described as the YouTube of Kenya, viz. Viusasa, provides access to original videos/films in some ECLs as well as foreign language films with subtitles or voicing in ECLs.

## **4 Limitations in growth poles**

Kenya's ECL mass media, in spite of the growth promise they hold out, have to contend with the fact that they are circumscribed within an ideological frame that regards the languages as a third tier means of communication in a restricted range of domains at best, or as mere symbols for a glorified rural existence. The restriction in terms of audience is apparent amongst ECL speakers in the city. In the urban centres the audience consists largely of speakers above 40 years of age, but for the younger generation, these are the stations to turn on to keep visiting grandparents informed/entertained.

Even more disturbing is the lingering suspicion that ECL can easily be transformed into lethal tools for rubble-rousers and perpetrators of genocides. Hence one radio station is accused of having fomented the 2008 killings in the Rift Valley in similar fashion as the *Rwandese Radio Television Libre des Mille Collines* did during the 1994 killings in Rwanda. One of the accused six who appeared before the International Criminal Court in The Hague after the 2008 ethnic clashes in Kenya was a broadcaster with one of the ECL radio stations. Vociferous debates calling for the ban of the "vernacular" stations have also been witnessed in the Kenyan parliament despite the constitutional protection of the ECLs. For example Kioko (2013: 123) notes:

Introducing a bill in parliament to ban the use of indigenous languages (other than Kiswahili) in official settings on the 8th of June 2011, a law maker made the following remarks "Article 7(2) of the Constitution recognises Kiswahili and English as the official languages of the Republic of Kenya; aware this provision will address ethnic disharmony in public offices if implemented to the letter...concerned that the use of indigenous languages in public offices and national institutions is a major contributor to disharmony, suspicion, and discomfort in public offices in the country, this house urges the government to ban the use of indigenous languages in all offices". (*Kenya National Assembly Official Records* 2011: 21)

The language committees as the gate keepers of ECL development have generally been made irrelevant by the fact that the policy requiring the use of the mother tongue in the first three years of school is not strictly adhered to:

Both formal research and informal observations indicate that, in some countries of Africa, national policy regarding language of instruction is not being followed. In Kenya, for example, the national policy calls for the mother

tongue or "language of the catchment area" to be used as the medium of instruction through Grade 3. However, Piper (2010) shows that in fact, the language used between 70 per cent and 80 per cent of the classroom time in Grades 1–3 is English – not Swahili and not the mother tongue of the students. This is true even in rural environments, where fluency in English was extremely low among the students. (Trudell 2013: 156)

Thus not much writing and publication is happening in the ECLs in spite of the policy and the measures to ensure standardization. It would appear that the raison d'être for the Kenya language committees, viz. evaluation of school materials in ECLs, is largely disappearing. Even where a strong language committee is still operational, as is the case with the Kikuyu committee, the engagement is more on language development and modernization than on evaluating learning materials because very few pedagogical materials are getting published.

This situation is surprisingly similar to that found in the rather popular English medium primary schools in Tanzania where children in the first two years of school are not even allowed to learn to read and write Kiswahili. The new curriculum requires that children focus on the three Rs in Standard One and Two – using Kiswahili *or* English, and no other subjects are allowed:

Therefore, the development of competence in the 3Rs in English-medium schools in Standards I and II will be carried out in English. The teaching of other subjects, including Kiswahili, will be introduced in Standard III. (Tanzania Government 2016: 2)

Religious literature in ECLs is available in a very restricted domain even as it is supposed to appeal to the heart. It is understandable that in urban areas such literature would be of very limited use because of the mixed nature of congregations – except perhaps for rather ethnically skewed denominations like the Presbyterian Church of East Africa which is predominantly Kikuyu in Kenya and runs Kikuyu services even in the city. In the rural areas there ought to be more room for using ECL versions of the religious materials. In Tanzania this is demonstrably not the case (Madumulla 2007, Muzale & Rugemalira 2008, Rugemalira 2013).<sup>7</sup> Even in rural Kenya the ECLs are not used exclusively; functions where speakers from other communities are present get conducted in English, even when the focus is on a particular ECL. Consider the example of the launches of the Meru bible<sup>8</sup>

<sup>7</sup> "There are many languages in Uganda and Tanzania with Scriptures available in them, but it is a challenge to discern exactly why they are not being used" (Liz Thomson of SIL, personal communication, 2010)

<sup>8</sup>https://www.youtube.com/watch?v=mLVq88X7eIE

and the Kikuyu bible<sup>9</sup> . On both occasions held in the respective heartlands of the speech communities concerned, proceedings were conducted mainly in English, and prominent guests (including church leaders, government ministers and civil servants from the focus ECL) addressed the audience in English.

It is possible that these choices of language were driven by logistical considerations [deference to invited non-ECL guests and benefactors] or the desire by public/political figures to project a national perspective/image. Whatever the case, there appears to be an underlying uneasiness when it comes to using ECLs in high level formal meetings. The people concerned appear to be anxious to show, even without being prompted, that they are nationalists rather than tribal chauvinists. Such behavior on the part of the Kenyans tempts one to regard it as "unity envy"<sup>10</sup> as they cast an eye at neighbouring Tanzania.<sup>11</sup> And considering the precarious state of ECLs in Tanzania, it is arguable that Kenya may be starting to tread the same path.

## **5 Killer languages**

Some language choices made by Kenyans surprise their Tanzanian friends. Why would a father and his son speak in English at the funeral of the wife/mother? Why would parents address the guests at their son's wedding in English? These questions raise the issue of whether English is the "killer language" encroaching on the space of the ECLs in Kenya. Muthwii & Kioko (2002), surveyed the language choice and language use patterns in both urban and rural areas among five main ECLs. The research found that both Kiswahili and English are used in the home domain even in the rural areas at varying levels among the five ethnic groups. Of significant mention are the leading ethnic groups in the choices between ECL, Kiswahili and English as the language in the home: The Kikuyu

<sup>9</sup>https://www.youtube.com/watch?v=1TYgLTC6X38

<sup>10</sup>"In the former Yugoslavia, Serbia and Croatia used to have one common language. But since Croatia became an independent republic, interviews with or statements made by Serbs are subtitled in 'pure' Croatian before they are broadcasted on the Croatian national TV. The suggestion is: we don't understand this strange, foreign language; our language is different from that of the Serbs. The wider context for this peculiar phenomenon of creating differences out of similarities is that of nation building and radical nationalism" (Blommaert 2014: 1).

<sup>11</sup>Why would Kenyans be "envious" of their southern neighbour? "… many will vehemently oppose any move to consider their speech variety as a 'dialect' of another 'language' … even prominent linguists join hands with politicians to agitate for and celebrate the production of written material in their speech variety, even when for years they have read and understood current material written in a related variety. It is an issue of ethnic identity, [which explains] the obsession with emphasizing the differences" (Kioko 2017: 244)

community was found to use ECL at home even in the urban areas; the Luhya community led in the use of Kiswahili at home even in the rural areas; the Luo community was leading in the use of English at home even in the rural areas; the Kalenjin community significantly used Kiswahili at home and the Akamba community significantly used ECL at home even in the urban areas. This is a clear indication that even in the home domain, the languages of wider communication have significantly made inroads.

Because Kiswahili in Kenya was not until more recently as aggressively promoted as it was in Tanzania, the local languages appear to have more space in the public domain than is the case in Tanzania. Unlike in Tanzania, in Kenya Kiswahili became a compulsory subject in the schools much later in 1980s. This lapse deprived Kenya the chance of creating a common national language that would neutrally be available to both the elite (the highly educated) and the masses (with only primary education). That gap has for a while been filled by English for a significant section of the Kenyan [educated] population. Kiswahili is coming from behind to claim that role, particularly for the section of the population with less than secondary education, but the competition for space involving English, Kiswahili and the ECLs is stiff. In the long term it may be safe to say that Kiswahili is going to expand its turf at the expense of both English and the ECLs.

In Tanzania, Kiswahili is much more clearly a greater danger for the ECLs and is bound to replace them as the language of normal everyday communication, partly driven by the education system, and other regulatory arrangements that limit ECL domains of use. However, the forces that threaten the ECLs appear to be too strong for normal treatment via the education system or other regulatory policy. It is easy to overestimate the impact of language policy in education, citing for instance, the apparent growth of Swahili in Kenya since the study of the language was made compulsory in the schools. But consider related negative evidence in the Tanzanian context. First, the dismal state of English in Tanzania persists (deteriorates) in spite of the fact that it is a compulsory subject at virtually all levels of the education system (including university), and the language of instruction at all levels except the initial seven years. Second, in spite of the laisez-faire situation regarding the languages of worship, the organized religions in Tanzania have largely shunned the ECLs.<sup>12</sup>

<sup>12</sup>This is partly understandable in the wider context that Blommaert (2014: 8) describes: "Swahili was swept up in a wave of massive nation building exercises in the late 1960s and 1970s, driven by and incorporated in the state ideology of *Ujamaa*. … the Tanzanian state made a successful attempt (successful, at least, for some time) at ideological hegemony, … Swahili was given a

## **6 The forces against indigenous languages**

#### **6.1 The structure of the economies**

It has been argued that languages do not easily appear or disappear by legislative fiat. Rather languages whose speakers are dominant in the "production and consumption interdependencies" will become dominant languages and attract speakers of marginalized communities who may thereby shift linguistic allegiances over time (Mufwene 2004: 218). The dominance of English in East Africa as the language of the state goes back to British conquest and to current American global dominance. It is the language of choice of a small section of the population (the elite) whose production and consumption patterns are more closely tied to the English nations abroad. The second class position of Kiswahili [relative to English] parallels its being a language of a larger section of the population that is struggling to find a precarious foothold in the modern economy as rather mobile labour with a limited/slippery say in the relations of production. Yet in relation to the ECLs, Kiswahili is a dominant language. Its position as a national and official language in Kenya and Tanzania acknowledges its power as a lingua franca without a rival among the ECLs.<sup>13</sup> These latter are largely spoken by a rural agricultural/pastoral population that has an even weaker hold on the surplus product of the land. As a consequence, Kiswahili is an attractive alternative over the ECLs and is within reach for a bigger section of the communities. English by contrast is beyond the reach of most people and so cannot be a realistic target for shift by the majority of ECL speakers.

Parents' language preferences for their children in the schools provide a good illustration of the patterns of language shift in progress. In Tanzania it would be very difficult to find parents who want their children to be taught in the ECL. Instead, in the rural areas, they support the children's immediate immersion into Kiswahili on the very first day of school and would regard any ECL instruction as a retrogressive measure – possibly a conspiracy to deprive their children a chance to forge ahead in the national economic and social network. Furthermore, the phenomenon of English medium primary schools, not just in the urban areas but also in relatively rural settings, attests to the desire to participate in that imagined international community that communicates in English and is visibly

prominent role in this process of homogenization. Swahili was, thus, deliberately constructed, manufactured, and not 'just' as a language, but as an overdetermined emblem of national belonging and ideological rectitude."

<sup>13</sup>"Language shift [is] an adaptive response to changing socioeconomic conditions [which have] undervalued and marginalized" indigenous languages (Mufwene 2004: 207).

represented by local elites with relatively good jobs and considerable power. This wish is realized to a greater extent in Kenya where even the official policy on LOI in the first three years of school is disregarded. As already noted, of course, in both Kenya and Tanzania, the reality is that the majority of people will find Kiswahili to be the realistic target for language shift as they get sorted into their respective economic and social slots.

Technological changes affecting mass communication: The ease with which messages are created and disseminated via the smartphone, as well as the language processing technology underlying the gadgets, will have a profound impact on future of the ECLs. Social media operate across local language boundaries; they amplify exposure to dominant languages and other forms of selfexpression that undermine ECLs by targeting a wider audience; radio and TV programming will be targeting wider audiences so that the local village radio station [where permitted] may have rather limited impact.

#### **6.2 Urbanization**

The large cities of Kenya and Tanzania are growing at a phenomenal, perhaps uncontrollable, rate. Placed in the global context such growth is not peculiar. The world population is already 50% urban and is forecast to be 66% urban by the year 2050 (URBANET 2019). The approximately 5 million inhabitants of Dar es Salaam constitute 10% of the population of Tanzania – which is 32% urban. Similarly the 4 million inhabitants of Nairobi make 8% of the Kenyan population – which is 26% urban. The bigger picture is that even the numerous small settlements along major highways or rural roads are a threat to the ECLs because their inhabitants operate in Kiswahili or English. In particular, the children born in such contexts, even if their parents have a common language, are likely to be Kiswahili first language speakers.

#### **6.3 Population movements**

Besides the rural to urban migration already discussed, there is another wave of population movements which may be largely rural and still threaten ECL vitality. Traditionally each language would be conceived as inhabiting a clearly demarcated geographical area. However, increasing populations and diminishing resources (land, water) have been forcing different speech communities to live in the same space.

In Kenya the Rift Valley province is a prime example of such co-existence with speakers of Kikuyu, Kalenjin, Luhya, and Kisii living in close proximity. The more

economically and politically dominant Kikuyu are often accused of not wanting to learn the language of the "natives", but it is fair to note that no community learns the language of the other community. As a result, the dominant lingua franca, viz. Kiswahili, has developed fairly well. Similarly, the Kamba speakers have dispersed out of their cradle land to various parts of the Coastal province. Their Kiswahili has prospered as a result. In both cases the threat to the ECL is growing because it is confined to the home even in such rural contexts.

In Tanzania large rural movements are associated with speakers of Maasai and Sukuma. From their Kenya/Tanzania border in the north, the Maasai have ventured as far south as the border with Mozambique/Malawi/Zambia. Similarly the Sukuma are no longer confined to the south shores of Lake Victoria but have moved all the way south to Mbeya, Iringa and Morogoro, just like the Maasai (Muzale & Rugemalira 2008). The scale of these movements has gained constant attention particularly via the frequent media reports about clashes between the pastoralist Maasai and the settled agriculturalists in Morogoro and Coast regions. Similarly, government operations to remove large herds of cattle from wetlands and reserve lands/forests have involved these cattle keeping communities. As in Kenya, the migrations do not foster the ECLs; rather they create conditions for the lingua franca, viz. Kiswahili, to prosper.

## **7 Conclusion**

It may be prudent to make a rough distinction between language promotion/revitalization endeavours on the one hand and language documentation/ conservation initiatives on the other. Activities that make a significant contribution to the active use and promotion of ECLs in Kenya include the teaching of these languages in the school system, and the publication and dissemination of printed materials in various civic educational campaigns pertaining to such matters as health, agriculture, animal husbandry, governance and human rights. The active use of the ECLs in the mass media and in worship may help some of the major speech communities to develop and hold onto these forms of expression for much longer than others. This paper poses a pertinent question in relation to the promotion endeavours: who holds the key to the promotion of ECLs? Are these initiatives that language researchers can contribute to or are these initiatives that only the speakers of the language can engage in? Would language researchers' engagement with the promotion endeavours in the Kenyan setting help in the maintenance or even development of the ECLs in the country? Or is the downward drift without a turn-around button?

Documentation/conservation efforts focus on the gathering and preservation of records of instances of language use, museum style, as part of humanity's intangible cultural heritage. This is the main thrust of a number of scholarly initiatives where the funding authority specifies strict criteria for identifying an endangered language, using an index of language vitality (UNESCO 2003). Typical products would traditionally include a descriptive grammar and a word list/ dictionary. Modern technology has made it easier to capture audio and visual records of speakers. Should this be the main thrust of scholarly initiatives with regard to endangered languages?

The Kenyan context contains a number of activities that still keep the ECLs in the public domain. How long it will take before these activities become part of museum records is a matter for considerable debate. The Tanzanian context suggests that ECL promotion is a lost cause, and given the close parallels between the two countries, Kenya cannot be far behind in the relegation of the ECLs to the museum. This is not a judgement on the desirability of linguistic diversity or on the moral grounds of the linguistic human rights movement. Rather it is an attempt to answer the question whether speech communities can turn or fight the tidal wave<sup>14</sup> of dominant languages and reverse the misfortunes of an endangered language.<sup>15</sup> And given the forces at play in Kenya and Tanzania the answer seems to be negative. In many of the cases, minority language advocates are viewed as parochial tribalists or sentimental "small is beautiful" enthusiasts. Promotional efforts have even been construed as attempts to deny the weak a chance to advance and catch up with dominant groups (in education, political power, economic advancement) by keeping the dominant language out of their reach (Mkude 2002). Hence the "suicidal" wish of marginalized speech communities is not an irrational psychological malady that requires psychotherapy and counseling. Rather it is a rational/shrewd assessment of the best interests of such communities, particularly the future fortunes of their offspring.

<sup>14</sup>"There are currently 7000 languages spoken in the world, and at least half are projected to disappear in this century. The Endangered Language Fund is helping to stem the tide". Endangered Language Fund, http://www.endangeredlanguagefund.org/

<sup>15</sup>"Ikiwa vikwazo hivyo havitatafutiwa ufumbuzi, basi juhudi hizi za kujaribu kuziinua zitakuwa ni sawa na mateke ya punda afaye, hazitasaidia. Tafiti, maandiko ya istilahi, sarufi na makamusi, pamoja na tafsiri zinazofanywa hivi sasa – bila ya kuvikabili vikwazo hivyo kwanza – zitakuwa ni amali za kuzipeleka kwenye majumba ya kumbukumbu tu ili zikapewe jina la nyaraka kuukuu kwa ajili ya kukoleza tafiti na simulizi za vizazi vijavyo kuhusu zama za wahenga wao" ['If these obastacles do not get a solution, then attempts to promote the indigenous languages will be like the kicks of a dying donkey, they will be useless. Research, publication of technical terms, grammars and dictionaries, together with translations currently being produced, – without addressing the obstacles first – they will amount to materials bound for museums to be regarded as archives for the enrichment of research and conversations of future generations regarding the era of their ancestors.'] (Madumulla 2007: 99)

## **References**


# **Chapter 18**

# **Discursive strategies for managing bad news: Exemplification from Akan (Ghana)**

## Samuel Gyasi Obeng

Indiana University

Bad news is a problem for both news bearers and news recipients, especially in situations where apprehensions run high given that it may run counter to people's in situ social and psychological needs (Maynard 2003). The object of this paper was to examine the discursive strategies used by diseased individuals and their caregivers to deliver and manage their bad news. In pursing the above objective, transcripts of narratives collected from diseased individuals and their caregivers were subjected to empirical inspection with the view to determining the communicative strategies they employed to deal with their special situation. The study was done within the framework of language and liberty (Obeng 2018, 2020) and the results showed that disease and "powerful" actors intrude on diseased individuals and care-givers' negative liberty (by encroaching on their fundamental freedoms) and positive liberty (by preventing them from participating in their family and communal lives). Common linguistic strategies used in talking about disease and in seeking and protecting participants liberty include: silence, hesitations, reduplication, adjectives of quality, adverbs and intensifiers, verbs denoting physical sensation, and factive formulae (for evidentiality and credence). Discourse-pragmatic strategies for delivering bad news and for seeking liberty include the speech acts of complaining, blaming and assuring. Other strategies include avoidance, inferencing and polyvocality. It is concluded that to protect diseased individuals' liberty from and liberty to, there is the need to put in place rights that protect these freedoms and empower diseased individuals to participate in their family and communal lives. Also, society must understand the communicational mores surrounding bad news delivery and management and be "educated" about the intertwining nature of language and care-giving.

Samuel Gyasi Obeng. 2022. Discursive strategies for managing bad news: Exemplification from Akan (Ghana). In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 317–344. Berlin: Language Science Press. DOI: 10.5281/zenodo.6393801

### **1 Introduction**

Communication between care-givers, patients and families is an essential component of any high-quality care, especially in cases where the illness is serious, and anxieties are high. Maynard & Freese (2012), in a study about how participants in interaction manage affective experiences and reactions to good news and bad news delivery noted how the delivery or receipt of news (both good and bad) momentarily disrupts recipients' involvement in a social world whose contours and features they typically take for granted, as well as the power of such news to evoke and display strong emotion in the news recipient.

Bad news includes news that conveys the results of a deadly and/or socially stigmatized disease, news that discusses and/or provides information about disease and death, or any form of tragedy or misfortune related to disease management, death and dying. In an earlier study, Maynard (2003) noted that breaking bad news is a real problem for both news bearers and news recipients, especially in situations where anxieties run high; a claim supported by Fallowfield & Jenkins (2004) and Brown et al. (2009). Indeed, as Maynard (2003) elucidates, people are reluctant to transmit bad news because of its communicative difficulty, the face threat it imposes on both the news bearer and the news recipient, and the impact (social, financial, etc.) it has on news recipients and their immediate families. According to Tesser & Rosen (1975) and Weenig et al. (2014, 2001), bad news with indefinite consequences is transmitted more often than bad news with definite consequences.

Research has shown that even though bad news delivery is so important, its acquisition and management within the care-giving professions are often not included in the training curriculum of care-giving institutions. Current studies into the training of medical personnel all emphasize the need to educate such personnel about appropriate and effective ways of breaking and managing bad news. For example, in a study that involved a communication skills workshop for nephrology fellows focusing on delivering bad news and helping patients define care goals, including end-of-life preferences, Schell et al. (2013) discovered that less than one-third of the studied nephrologists reported prior palliative care training. All respondents felt that communication skills were important to being a "great nephrologist," and that an essential part of communication must involve ability to deliver bad news, express empathy, and discuss dialysis initiation and withdrawal. Another study by Bays et al. (2014) also emphasized the importance of providing training for health care personnel in breaking and managing bad news to, and with their patients, with the researchers concluding that communication skills intervention was associated with improvement in trainees' skills in giving bad news and expressing empathy.

Another study that dealt with the need to have effective communication skills in breaking and managing bad news was that by Walczak et al. (2016) wherein the authors identified and synthesized evidence for interventions including communication skills training, education, advance care planning, and structured practice changes that targeted end-of-life communication. Participants targeted included patients, caregivers, healthcare professionals and multiple stakeholders. The researchers discovered that interventions targeting patients and caregivers are essential to breaking and managing bad news; however, they also found out that barriers to end-of-life communication may more effectively be removed via multi-focal interventions.

In another study about patients' perception regarding the disclosure of news about their cancer as regards physician counselling and how they perceived the flow of information between hospital‐based and family physicians, Spiegel et al. (2009) discovered that 37.7% of their respondents viewed the news disclosure about their cancer as being presented to them "very empathically" or "empathically." Two-thirds of the patients (62.3%) stated that the news disclosure about their cancer was presented to them "not so empathically" or "not at all empathically." Most importantly, the researchers observed the important role of family physicians in breaking bad news to the patients. Specifically, they discovered that patients were more likely to state that the bad news about their cancer had been done "very empathically" or "empathically" when counselled by family physicians in contrast to when they had been counselled by hospital‐oncologists or self‐employed specialists.

Furthermore, a great majority of patients (81.8%) felt that they had been given adequate opportunity to ask the questions that they considered to be important when they were counselled by a family physician. Only 43.5% felt that they had been given adequate opportunity to ask the questions that they considered to be important when they were counselled by a hospital‐oncologist or a self‐employed specialist (44.3%). Also discovered by the researchers was the fact that 56.8% of the patients preferred to discuss the suggested cancer therapies with an oncologist. The above discoveries led the researchers to conclude that it is essential for oncologists to involve family physicians in breaking bad news to patients.

Within discourse theory, managing bad news falls under therapeutic discourse that involves talk-in-interaction that represents the social practice between clinicians and clients (Leahy 2004), talk-in-interaction between diseased individuals, their caregivers and other discourse types with the potential to bring emotional, social, and physical relief, as well as discourse types aimed at assisting victims to cope with and/or adapt to their difficult situations (Obeng 2008).

#### Samuel Gyasi Obeng

On the discursive strategies for breaking and managing bad news, research has shown that within the western medical discourse ecology, hesitations, qualifiers, and maneuver or circumlocution (Maynard 2003, Brown et al. 2009) and prosody, especially volume and pitch (Maynard & Freese 2012), are some of the common strategies used in delivering and managing bad news. In Akan (Ghana), Obeng (2008) identified "speaking to the wind" (addressing God, a deity or a person who is not in a ratified state of talk with you), songs, conventional indirectness, nonconventional indirectness (especially, "twisted speech" e.g., idioms and proverbs), use of different *wh*-question types as being some of the strategies for breaking and managing bad news. Obeng also noted that in Ghanaian society, there is the tendency for diseased individuals to blame their disease on other people (sometimes even on their caregivers), and that being able to blame others offers some relief since it takes responsibility away from the diseased individuals as being the source of their illness even if such communicative behavior creates tensions between them and those they blame for their illness. Indeed, on the conveyance of diseased individuals' emotional state, Kalmykova & Mergenthaler (1998) argue that narratives serve as one of the commonest and best communicative means to convey speakers' emotional states given that they are able to stimulate responses of listeners.

Another strategy for breaking bad news, according to Movahedi (1996), is through a second language given that such a language provides space where certain personal and cultural facets of a diseased individual or a caregiver may be more easily verbalized than through their first language. Specifically, there is the tendency for a diseased bilingual to resort to code-switching when breaking and talking about bad news since such bad news tends to be more 'tellable' in a less familiar language or register.

The World Health Organization's Constitution (World Health Organization 1946) observes the right to health as one of a set of internationally agreed human rights standards which is inseparable or "indivisible" from other rights such as that to be free from intrusion from the state or institutions affiliated with a state or the right to participate in one's communal affairs. Other rights include the right to vote, the right to free speech, and the right to provide for the underlying determinants of health, such as safe and potable water, sanitation, food, housing, health-related information and education, and gender equality. However, thus far, there has been no study of the nature and/or type of language used by diseased individuals to seek liberty from their care-givers (including health personnel) or people around them while breaking and dealing with their bad news or when dealing with their unique burdensome health communication; this study seeks to fill the gap.

### **2 Theoretical framework**

The study is done within the framework of language and liberty (Obeng 2020, 2018). I view health as both a *human right* (by justifiably belonging to every person), and a *civil right* (by requiring society to ensure the right to equality of health for all). Obeng's theory on language and liberty is inspired by Sir Isaiah Berlin's (1960) theory on *liberty* which includes *liberty from* (*negative liberty* which involves the protection of individuals from the intrusions of the society and others into their fundamental freedoms), and *liberty to* (also called *positive liberty*, which guarantees the right of individuals to participate in governance and to share in the political power of their communities). Positive liberty includes the right to freedom and/or independence at the various levels of an individual's community.

In working within the theory of *language and liberty*, it is important to establish the fact that besides liberty being a philosophical, political and juridical concept, it is also a health concept, and that language is used to express liberty and the above associated concepts just as liberty depends on language to become a reality. Therefore, in working within the theory of language and liberty, I posit: (a) that disease, caregivers (community), and sometimes the state, may encroach on diseased individuals' negative liberty by forcing them into certain states of health (physical, emotional etc.), and thereby preventing them from doing what they can otherwise do; and (b) that disease, caregivers (community), and sometimes the state, may deny diseased individuals their positive liberty by preventing them from exercising their right to participate and share in their family and communal lives. As a linguistic concept, language and liberty are intertwined. In particular, liberty informs and is informed by language and relies on language to become reality.

Working within the theory of language and liberty requires me to examine such important issues as the nature of material communicative and culturally congruent conditions put in place (or made available) to ensure that diseased individuals can seek and maintain liberty (that is, be guaranteed liberty). In particular, I examine the available culturally congruent communicational freedom mores and/or framework which protect individuals' liberties and thus guarantee liberty in both the negative and positive senses. Specifically, I examine the linguistic and discourse-pragmatic features that are available and allowable for use by diseased individuals to vent their socio-emotional frustrations, challenge their care-givers if need be, and seek material support to enable them to deal with their situation, among others. The absence of such culturally congruent material and communicational freedom mores in a community creates a situation in which people in power (care-givers, lineage elders, etc.) may needlessly infringe on dis-

#### Samuel Gyasi Obeng

eased individuals' negative and positive liberties with impunity, especially when such individuals hold divergent views about the nature and/or type of their care. Two important questions I attempt to answer are:


## **3 Aim and method**

This paper aims at recapitulating and extending work I did on Akan therapeutic discourse (Obeng 2008) by examining the communicative strategies used by diseased individuals, caregivers, and persons they interact with, to seek liberty while breaking and managing bad news. In pursuing the above aims, I examine:


that are used by victims of socially stigmatized diseases and their immediate care-givers, relatives or friends to seek liberty while managing their bad news.

Data for the study are made up of transcripts of 6 recorded interactions collected in Akan-Twi, in Asuom and Accra (Ghana). Data collection spanned August 2015 to October 2015 and February 2016 to March 2016. Participants included 6 diseased individuals and 13 caregivers. The data content dealt with sickness and death. Specifically, the data consisted of discourses involving diseased individuals' lived experiences and those of their caregivers, and interactions about death, especially, impending death. A form of the Akan orthography is used in the transcription and participants' initials are used to create anonymity.

The data were chosen because they involved the delivery and management of bad news and were produced by care-givers and diseased individuals who

lived with and thus experienced bad news. Participants' lived experiences either as diseased persons or as close relatives caring for the diseased persons meant they experienced the denial of liberty either as agents or patients thereby making their assertions and claims about liberty authentic. Also, the data are replete with speech acts that are common in healthcare management discourse. Among such speech acts are: complaints relating to uncertainty about diagnoses, insufficient care, or lack of care; criticism of the nature of care of care-givers or of the diseased individuals; request for help (financial or moral support); assurance; among others. Through the data we are led into the studied participants' worldview about lived experiences relating to the pursuit of liberty and the various discursive strategies via which liberty was pursued.

## **4 Texts and discussion**

The first hypothesis I put forward relates to the interconnectedness between language and liberty and is stated as follows: in asymmetrical medical discourse, actors without power (diseased individuals and care-givers who bear bad news) have linguistic and pragmatic strategies for seeking liberty; that is, speaking the unspeakable (tabooed expressions) irrespective of whether or not their pronouncements will be viewed as ungrateful about their care, dissatisfied with their care, and in the case of care-givers, viewed as uncaring. It is argued that being able to speak the unspeakable offers the bad news bearers relief (communicational as well as emotional). We posit further that through their language, we are brought into the bad news bearers' worldview about how language and liberty inform each other. Even though two long extracts are cited below in support of the above observations, it is important to note that the cited extracts represent the many cases that help illustrate the above observations and others.

#### **4.1 Excerpt 1**

Context: Breaking the bad news about a tabooed disease (cancer). Recorded in Asuom (Ghana) September 12, 2015.


Hm! hm Deɛ what aba has.happened deɛ foc ɛso it.big(ger) sen than me. me

	- a. Obiara everyone nnim. neg.know Ebi some se say ɛyɛ it.be duabɔ, curse ebi some nso also se say ɛyɛ it.is nkrɔfoɔ those.people no their biribi thing no that bi. some
	- b. Sɛ whatever ɛyɛ it.be deɛn, what sɛ whatever ɛyɛ it.be deɛn, what obi someone nhu. neg.know

Mode you.take no her akɔ has.gone dɔkota? hospital

(7) T7 AB:

Yɛde we.take no her akɔ has.gone baabi every ara where Obiara everyone nhu not.see adekodeɛ. thing.exactly

	- a. Onipa Human wote you-be hɔ there na conj wo your honam skin ahyehye has.burnt.burnt ayɛ has.become tumtumtumtum. black.black.black.black
	- b. ayɛ become ammɔdin unmentionable
	- a. Sɛ whether ɔakɔtia she.has.gone.step biribi something so on o, conj sɛ whether deɛn what o, conj obiara everyone nnim. not.know
	- b. Yɛadeɛ disease yi this deɛ as.for ɛmfiri it.not.from ha! here
	- Now as.for God n'adom his.grace oo, interj n'adom! his.grace

(2.0)

a who menhia I.not.need

(11) T11 KN: Na And ɔwɔ she-be he? where (12) T12 AB: Ɔhyɛ she.be.loc dan room no the mu in hɔ there [KN enters AMD's room] (13) T13 KN: AMD AMD ɛte it.be sɛn? how? (14) T14 AMD: =Ei, interj Braa! Brother aa well mekonkɔn I.hang.in.hang.in hɔ. there (2.0) (15) T15 KN: Apɔ joints mu in te be.stative sɛn? how (4.0) (16) T16 AMD: Braa, brother me I na emp meni I.be oo! (like.this) Hm! interj (1.0) ɦm (well) (17) T17 KN: Wo your ho self ho will.be yɛ you wo strong den. (3.0) (18) T18 AMD: a. Sɛ as wo you ara emp wonim; you.know obi someone a who mewɔ I.have m'dwuma my.work obiara anyone's mmoa. help b. ɛnnɛ now/today hwɛ! look Ɛno that ara emp ne be sɛ that mete I.stay faako one/same.place (19) T19 KN: Na wɔn de wo akɔ dɔkota?

and they take you have.gone hospital

#### (20) T20 AMD:

Ɛno that deɛ as.for mete I.sit wɔn them so on a if mesɔre. I.get.up Obi everyone ara knows nim that sɛ they wɔn have.done ayɛ what nea they wɔn can bɛtumi!

(21) T21 KN:

Na and mokɔɛ you-went no when wɔn they se say dɛn what na emp ɛɛkɔɔso it.prog.cause yadeɛ disease yi? this (2.0)

(22) T22 AMD:


Na and dɔkotafoɔ doctors no the yɛɛ do.pst ho self biribi something maa for wo? you

	- a. Deɛ what wɔn they kaeɛ said ara only ne be sɛ that yɛmfa we.bring me me mmra come fie. home
	- b. Yadeɛ disease yi this ama has.made me me asa encumbered ama has-made me me ayɛ become abɔfra. child (2.0)

#### (25) T25 KN:

Me my sewaanom aunt.them boa help wo you anaa? q (4.0)

(26) T26 AMD:

Aaa well ɛyɛ. it.okay Wo you ara emp wonim; you.know ayɛ (it).has.become sɛ like wo your nsa hand atɔ has.fallen ɔsaman aduane mu.

ghost's food in

	- a. Kyerɛ meaning sɛ that ɛnyɛ it.not.be papa good bi exactly ara. very. (4.0)
	- b. ɦm Hm (2.0) (2.0) it.will.be Ɛbɛyɛ well/okay yie

(28) T28 AMD:

Aaa Well anomaa bird bi a ne and ne its su; cry; ɔse: it.says "Aaa Well yɛrehwɛ." we.are.looking

#### **4.2 Excerpt 1 (Translation)**


[KN Enters AMD's Room]


An observation of the above excerpt reveals the use of various pausal phenomena in an attempt to hold back the bad news given its communicative difficulty. Prominent among the different types of pausal phenomena are silence, hesitation or voiced pauses, and a combination of both. What is unique about the silent pauses is the duration. Specifically, the silent pauses are much longer than what is generally seen as normal in Akan interaction which is between 0.1 and 0.5 seconds (Obeng 1987, 1989, 1999) . In American English conversations, Jefferson (1983) identified 0.5 seconds as being the normal silent pause between a current turn and the next turn. The 4.0-second pause between KN's question *ɛbaa no sɛn?* 'what happened?' in T1 and AB's turn (T2) signals the difficulty in breaking the bad news. In fact, AB's first utterance is a voiced pause *hm* [ɦm] produced with a piano volume and a low pitch height which, in Akan, is used to preface or signal an upcoming constrained utterance. Thus, the use of the voiced pause, *hm* [ɦm], acts as a hedge and points to the fact that it is culturally neither congruent nor appropriate for AB to deliver the bad news at that point in the interaction given that her liberty to do so is impeded by the cultural conventions of delivering bad news. Besides the phonetic cues of silent and voiced pauses, we also have the syntactic cue of focus marking to project the bad news. The expression:

(29) Deɛ what aba hascome deɛ, foc ɛso it-big(ger) sen than me. me

'This news/tragedy is bigger/beyond my capability to break/deal with.'

Thus, we see from T1 and T2 the restriction placed on the bad news bearer in breaking the bad news. Note that an observation of T3 to T5 also confirms the communicative difficulty associated with the bad news delivery. Both T3 and T4 are produced by the same speaker, KN. Within conversational analysis (CA) theory and practice, the 3.0-second pauses between T3 and T4, and that between T4 and T5 are referred to as initiative time latency pause and point to a current speaker's termination of her turn and a next speaker's refusal to assume turn ownership thereby forcing the current speaker to continue talking. The fact that AB comes in only after KN rephrases the question in T3 and poses it again in T4 points to the level of difficulty in delivering the news about AMD's illness. AB's answer (T5), *Obiara nnim. Ebi se ɛyɛ duabɔ, ebi se ɛyɛ nkrɔfoɔ no biribi no bi. Sɛ ɛyɛ deɛn, sɛ ɛyɛ deɛn, obi nhu* 'No one knows. Some say it is a curse, others say it is that (disease) belongs to the other people', is of considerable ethnopragmatic import: First, she resorts to avoidance via a factive construction *Obiara nnim* 'No one knows' in which the subject of the sentence, *Obiara*, literally meaning everyone but idiomatically means 'no one' due to the negative particle *n*- that precedes the verb *nim* 'know.' Note that in Akan medical discourse use of the quantifier *Obiara* is a form of number game used to support an assertion. In the current discourse context, the expression *Obiara nnim* 'No one knows' is used to signify the difficulty in breaking the bad news. If no one knows, then who is she, a non-medical person, or one without the power of divination about illness, to claim knowledge of it?

AB's next sentence in T5, *Ebi se ɛyɛ duabɔ, ebi se ɛyɛ nkrɔfoɔ no biribi no bi,* 'Some say it is a curse, others say it is that (sickness) which belongs to those people' also involves avoidance via the use of the non-specific determiner, *ebi* 'some (people).' Note also that the use of avoidance also points to the intrusions on AB's liberty; she is not permitted to name the source or cause of AMD's illness given that delivering such news could result in communal disintegration since in the studied community such diseases are attributable to relatives who are either jealous of the diseased individual and therefore use a curse to make her sick or even cause her death. Not naming such people helps maintain harmony; even if on the outside. Note also that AB is not responsible for the inferences and conclusions others may draw or have regarding the disease's causation. She expresses her non-responsibility for such conclusions by engaging in a disclaimer by way of attributing the conclusions drawn about the disease to others. The expression ɛyɛ *nkrɔfoɔ no biribi no bi* 'it is that (sickness) which belongs to those people' attributes the cause or source of a disease to a foreign source or other(s) rather than self or one's own group and enables the caregiver to blame those people from whom the disease originated for the disease's burden.

#### Samuel Gyasi Obeng

When asked whether they had taken the diseased person to the hospital (T6), AB responds in T7: *Yɛde no akɔ baabi ara. Obiara nhu adekodeɛ.* 'We've taken her to every place. No one knows what it is.' Use of the first person plural pronoun prefix, *yɛ*- 'we', suggests that the caregiving had been communal and that there had been no apathy. The second sentence which recapitulates her earlier statement about no one knowing what the disease was, is a marker of frustration and a request. The words *baabi ara* 'everywhere' and *Obiara* 'no one' which preface *nhu adekodeɛ* '(not) know what it is,' point to the exhaustion of all possible options about cure by all persons with no success; a waste of scarce communal resources for a hopeless cause. The above expressions also signal to KN that if he knows of any place (other than those already tried by the caregivers) that they (the caregivers) may not be aware of, then he must, as required by custom, take the diseased individual there for treatment. Thus, this is both an indirect request for KN to help and also a warning for KN not to blame the caregivers for not trying their best to help.

In T8, AG, another caregiver, describes the disease without mentioning the diseased person's name; she refers to her with the noun *onipa* 'person.' Use of the non-specific reference, *onipa* 'person,' instead of the third person pronoun, *ɔnʊ* 'she,' or the diseased person's name, affords the speaker the liberty to talk about the disease without getting into the specifics of naming the diseased person. Use of the reduplicative *ahyehye* 'multiple "burns"' describes the intensity of the disease as well as the multiple places where the melanoma 'burns' have affected the diseased person. Note also that the reduplication of the word *tumm* 'black' four times also describes the extent of the coloration of the skin by the disease; something which she goes on to describe as an unmentionable (that is, a taboo). Like the caregivers before her, she ends by also attributing the disease to a foreign source saying, *Yɛadeɛ yi deɛ ɛmfiri ha!* 'As for this disease, it is not from here.' She could simply have said 'it's not from here' or 'this disease is not from here.' Use of the focus marking expression *Yɛadeɛ yi deɛ* 'as for this disease,' adds to the uncertainty about the nature of the disease and its source. By attributing it to a foreign source, she is given the liberty to talk about the disease since any repercussion or shame associated with it is deflected to an unknown source.

It is important to note that up till T13, the disease, cancer, had not been named because it is considered a tabooed disease in Akan society. The fact that the name of the disease was avoided by the care-givers leads us to posit our second observation which is that: care-givers whose liberty to deliver bad news may be constrained, may leave the delivery of the bad news to the diseased individuals themselves in order to obviate crises. It is only after KN speaks with AMD that she names the disease. Specifically, it is in T22 that AMD mentions cancer, but

even then, she resorts to uncertainty. When asked about what disease she had and what was causing it (T21), she hesitates first, pauses, and then mentions the possibility of it being cancer. She then goes on to explain the uncertainty about the disease type with the expression, *ebi se sei, ebi se sei. Edin bebree. Wobɔ din a ɛnyɛ yie.* 'some say this, others say that, so many names, it is an unmentionable.' The repetition of the construction *ebi se sei*, is to create and amplify the extent of doubt about the exact nature of the disease. Via the repetition, she appears to be saying that no one really knows what the disease is. Also, the expression, *Edin bebree* 'so many names', adds to the uncertainty about the exact nature/type of disease. It also points to the extent of the taboo nature of the disease or its incurable nature. If the disease has that many names, and if caregivers have not settled on a name, then it is, to say the least, a bad disease. Finally, the expression, *Wobɔ din a ɛnyɛ yie* 'it is impossible to mention its name' that is, 'it is a tabooed disease,' lends a further measure of support to the dangerous and/or terrible nature of the disease and how restraining its effect had been on AMD's life. In fact, one could argue that the nature of the disease intrudes on AMD's liberty to even name it without resorting to avoidance and circumlocution.

On the denial of her liberty, the following three expressions that were produced by AMD,


are discursively most significant given that they each express the diseased individual's (AMD's) recognition of her disease intruding on her liberty by making her needy, ruining her business, immobilizing her, and making her dependent on others for care and sustenance. From the discourse-pragmatic perspective, the phrase *me na me ni oo* 'Can you believe it's me you're looking at?' expresses incredulity about her physical appearance brought about by the disease.

From the above excerpts, we observe how AMD's utterances index the extent to which the disease had encroached on her liberty (negative and positive). From the point of view of valence, AMD's utterance, *me na me ni oo*, expresses the emotional burden brought on her by the disease and from the socio-economic point of view, the downward spiral of her social status from self-sufficiency to that of dependency. The interjection *oo* expresses self-pity.

KN's utterance in T17, *Wo ho bɛyɛ wo den,* 'You will be well,' is a speech act of assurance done via a declarative sentence to suggest confidence in what is uttered and intended to help AMD to not give in to the disease but to emotionally manage it. AMD's self-sufficiency and independence are expressed by the sentence, *Sɛ wo ara wonim; obi a mewɔ m'dwuma a menhia obiara mmoa* 'You know very well; some(one) who had her own business and was never in need.' This construction is made up of a factive formula, *Sɛ wo ara wonim* 'You know very well,' which establishes the truth or credence of the following utterance, *obi a mewɔ m'dwuma a menhia obiara mmoa* '(some(one)) who had my own business and was never in need;' her self-sufficiency. Her last utterance, *ɛnnɛ hwɛ! Ɛno ara ne sɛ mete faako* 'today/now look, all I do is stay at the same place,' suggests that her right to physical mobility has been taken away from her by the disease. Even though she does not mention the disease as having restricted her to her home, it is implicitly stated. The expression, *ɛnnɛ hwɛ!* 'today/now look,' which was accompanied by a hand gesture of showing both palms upwards and drawing them to herself, communicates her frustration at being paralyzed and bedridden.

In T25, KN asks about care-giving and one sees from AMD's answer the communicative burden inherent in the question. Responding directly that the care provided her by the relatives was insufficient and also not done of their own free will but out of 'social compulsion' would have marred her face and those of her caregivers. She seeks communicative liberty by: (a) resorting to hesitation, as in the sentence, *Aaa ɛyɛ* 'Well, it's okay;' and (b) using a factive expression, *Wo ara wonim* 'you very well know' which presupposes the truth of the following statement and consequently makes it impossible to refute the propositional content of the following complaint, *ayɛ sɛ wo nsa ato ɔsaman aduane mu* 'it's like putting your hands in a ghost's food'; meaning committing to something and hence not being able to discontinue the care-giving. Via inferencing, AMD appears to be saying that the care-givers are providing her with care because they are socially obliged and not because they care so much about her welfare. Indeed, KN, in T27, resorts to inferencing to explain AMD's previous utterance (T26) by explicitly saying 'that means the care isn't very good' and that forces AMD to be more direct by saying, *Kyerɛ sɛ ɛnyɛ papa bi ara* 'Meaning, it's not very good.'

Phonetically, the long silent pause of 4.0 seconds within KN's turn points to the fact that he was done with his turn and wanted AMD to take over the turn ownership. The fact that AMD did not take over the floor suggests the communicative burden placed on her by KN's line of questioning. Not being grateful to

her care-givers is tabooed no matter how insufficient the care might be. When KN continues, he engages in the use of a voiced pause *Hm* [ɦm] which is also a hesitation and signals that the discourse thus far may have involved a face threat. Pausing within one's turn for 2.0 seconds after the issuance of the voiced pause also confirms the face-threat inherent in his utterance and hence a change in the discourse content from asking about the nature of the care-giving to providing assurance that all will be well via the expression, *Ɛbɛyɛ yie.* 'It'll be well.'

In AMD's final utterance she resorts to polyvocality via the aphorism, *Aaa anomaa bi ne ne su; ɔse: "Aaa yɛrehwɛ"* 'Well, it is a certain bird that cries/sings: We'll see,' where she attributes her doubt about a change in the nature of care to the 'cry' (song) of a bird that says, 'We'll see.' This aphorism has public knowledge of wide accessibility given that it is known by all competent adult native speakers of Akan. Basically, AMD appears to be saying something like, 'I will believe it when I see it.'

Next, we examine another excerpt with a view to also identifying the linguistic and discourse-pragmatic strategies used in delivering and managing bad news about a diseased individual with mental/emotion problems, and the communicative strategies used to deny and seek liberty.

#### **4.3 Excerpt 2**

Context: Breaking the bad news involving a socio-emotional and/or mental challenge. Recorded September 19, 2015.


(33) T4 AC:

Seesei Now deɛ as.for ɛnyɛ (it).neg.good koraakoraakoraakoraa. whatsoever-whatsoever-whatsoever-whatsoever

	- a. Wɔfa Uncle nipa people bi some yɛ be nipa people bɔne bad papapapapapa. very.very.very

(3.0)

	- now he.be where

ɛba (It).come mu in.(it.happens) saa that a, if ɔnkasa he.neg.speak obiara anyone ho; self yɛama we.have.made no him akɔhyɛ go.be dam room mu in

(37) T8 KO:

Moma You.pl.allow me me nkyea greet no. him

(38) T9 AS:

Wɔfa Uncle ɛnha not.worry wo your ho self na for ɔremmua he'll.neg.respond wo. you

[KO enters room]

(39) T10 KO:

YG; YG ɛte (it).be sɛn? how [KO returns from room]


Mode You.pl.take no him akɔ have.gone hɔspitl hospital anaa? Q

(43) T14 AC:

Baabi Some.where ara every nni neg.be hɔ there a that yɛmfa we.neg.take no him nkɔeɛ. neg-go

sɛ that

	- Enti So seesei now yɛreyɛ we.prog.do no it dɛn? what
	- a. Wɔfa Uncle wode you.take no him bɛfa will.go baabi somewhere a. if
	- b. ɔbarima man sei, this sɛ if ɔsɔ he.holds dadeɛ machete mu in a, if nso but yadeɛ illness yi this nti because.of ɔte he.stays faako one.place
	- a. Baabi Some-place bɛn where bio? else Adeɛ thing no that deɛ as.for wo you ara emp wonim you.know sɛ. that
	- b. Asɛe (it).has.spoilt awie; finish na and.so ɛhe where bio? else
	- a. Wɔfa Uncle ayɛ it.become den difficult ama for yɛn us yie! very Sɛ If ne his yadeɛ illness no the ba come a, if
	- b. na then yɛn us nyinaa all yɛn our ho selves hyehye burn.burn yɛn; us obiara everyone repɛ prog.want baabi somewhere akɔtɛ. go.hide
	- a. ɛha Here deɛ as.for baabi somewhere ara any nni (n.where) hɔ neg.be a there mode that no you.take bɛkɔ him will.go

b. akɔ go gya leave na so.that moahome you.rest kakra little

#### (51) T22 AC:

Daabi No oo. voc/interj Mmerɛ Time bi some wɔ be hɔ there mpo even a, if bosome month koraa even na emp mennaeɛ.

I.neg.sleep

(52) T23 KO: Hm. ɦm Oh oh diɛ! dear dis this iz is nɔt not guud! good

#### (53) T24 AS:

a. Den Hard deɛ, as.for ayɛ (it).has.been den, hard nso but ɔboafoɔ helper biara any nni neg.be ha here oo. voc.


#### **4.4 Excerpt 2 (Translation)**


[KO enters room]


A systematic observation of above excerpt shows similarities with those of Excerpt 1 in terms of the features that are used to deliver bad news and to seek and protect diseased individuals' liberty. Thus, as in Excerpt 1, as in this excerpt (Excerpt 2), the linguistic features used for managing the bad news include pausal phenomena, reduplication, factive expressions and Akan-English codeswitching. We begin by with pausal phenomena.

#### **4.5 Pausal phenomena**

In T2, AC signals the bad news first by using the voiced pause, *ɦm*, a hesitating pause that is used as a hedge and a signal to the upcoming bad news. This voiced pause is then followed by a short silent pause notated as (.). The long pauses of between 2.0 and 4.0 seconds that occur between the turns are used to signal or project upcoming bad news. The longer the pause the more difficult it is for the bad news bearer to deliver it. For example, when asked how YG (the diseased individual) was doing (T3), AC in T4 paused for 4.0 seconds signaling that YG was not doing well or that his condition of health had not improved. When KO repeated his question about the whereabouts of YG in T6, AC paused for 3.0 seconds again to signal the bad news about YG's health. What is important in this extract is how the long pause was followed by an evasive answer. Note that there are several places such as T11 and T12 where a long pause of 3.0 second duration, an initiative time latency pause, is used because AC did not assume the turn occupancy and KO had to continue as a result of the extent of the bad news regarding YG's health and his behavior of not being communicative when suffering an (emotional/panic) attack from the disease.

#### **4.6 Reduplication**

Throughout the discourse reduplication is used by the studied participants to show the intensity or extent of badness of the news or the extent to which a diseased individual's physical or positive attribute has been destroyed by the negative effects of the disease. In T4, AC says: *Seesei deɛ ɛnyɛ koraakoraakoraakoraa* 'As for now, it is extremely bad.' Repeating the word *koraa* four times shows an extreme level of badness of YG's health status. Also, in T5, AS employs reduplication to show the extent of badness or evil nature of some people (witches or people with the spiritual capability to cause others to be sick). In the utterance, she notes: *Wɔfa nipa bi yɛ nipa bɔne papapapapapa.* 'Uncle, some people are ˈextremely bad.' I have used uppercase letters and bold together with a stress marker on the word extremely to show emphasis and extent of badness as expressed by AS. The intensifier, *pa* 'very,' is repeated six times so we have the following structure: [det + Noun + Copula + int + int + int + int + int + int + adj] i.e., some + people + are + very + very + very + very + very + very + bad. Repeating the intensifier six times suggests excessive badness. Thus, by engaging in reduplication we are led into AS' world view about disease causation (in this case, witches) and the extent of badness or evil nature of such persons. In her following utterance, AS engages in another form of reduplication by saying: *Aberanteɛ fɛfɛfɛfɛfɛfɛ hwɛ deɛ wɔn ayɛ no!* 'Such a ˈhandsome gentleman! Look at

what they've done to him!' Here, AS uses reduplication to describe the extent of handsomeness of YG, and by implication to suggest the debilitating effect of the disease on him (YG). Like extremely, handsome in the above sentence has been bolded, capitalized and marked with a stress diacritic to show the extent of handsomeness. Note that *fɛ* means handsome or beautiful so repeating it six times shows extreme beauty or handsomeness. Thus, AS appears to be saying that if a gentleman as handsome as YG could be turned into such a monster such that his care-givers have to either confine him to a room or run away from him, then one sees the extent to which disease can intrude upon the liberty of both the diseased individuals and their care-givers.

Finally, in T20 AS notes, *yɛn nyinaa yɛn ho hyehye yɛn* 'literally, we all, our bodies burn-burn us' meaning 'we all feel uneasy' to project the bad news about the difficulty in caring for YG. The verb *hyehye* is a physical verb that denotes physical sensation and unveils both the physical and emotional burden of disease on all stakeholders in the care-giving.

#### **4.7 Factive expressions**

Factive expressions such as *wo ara wonim* 'you very well know' are often used to show credence or provide evidence about upcoming bad news, news about care, or a point about one's physical or emotional condition. In T19, AC notes: *Baabi bɛn bio? Adeɛ no deɛ wo ara wonim sɛ asɛe awie; na ɛhe bio* 'Where else? As for that thing (the disease), you very well know that it has already destroyed him).' AC uses the factive formula, 'you very well know' for evidentiality. She appears to be saying something like: if it is common knowledge that the disease has already destroyed YG, then there is no need to seek a cure or medical attention anywhere else; after all the die is cast!

#### **4.8 Akan–English code-switching**

As noted earlier, code-switching may be employed to deliver bad news given that it is easier to deliver bad news in a foreign language than in one's own language. In Excerpt 2 above, we observe the bad news, the FTA, and/or the communicative difficulty being delivered in English. Thus, KO switches from Akan in T21 to English in T23 by responding, *Hm. Oh diɛ! dis iz nɔt guud!* 'Hm, oh dear; this is not good' to acknowledge receipt of the bad news. By acknowledging the news was bad via English, the burden of speaking the unspeakable is lessened (Movahedi 1996).

From the discourse-pragmatic point of view, bad news is managed through avoidance whereby the diseased individual's name is not mentioned, and he is

referred to by a non-specific name such as *Aberanteɛ no* (T3) 'the Gentleman.' The caregivers may also resort to evasion as observed in T6 and T7 where instead of answering KO's question as to the whereabouts of YG (the diseased individual), AC rather talked about what happened if YG (the diseased individual) was having a bad day.

On denial of liberty, we observed that the interactants used utterances related to forced imprisonment whereby the care-givers constrained the diseased individuals into rooms and denied them the opportunity to come out as shown in T7 where AC responded to KO's question about the whereabouts of YG (the diseased individual) with the statement, *ɛba mu saa a, ɔnkasa obiara ho na yɛama no akɔhyɛ dam mu* 'When he has an attack, he speaks to no one and we force him into a room.' By the action of the caregivers, we are led into the experience of the diseased individual and thus shown how a disease intrudes upon the positive liberty of diseased individuals (in this case, YG) by preventing them from participating in their family and communal lives.

Also, denial of liberty was also seen in situations where caregivers were forced to flee from a diseased individual and hide for their own safety as expressed in the utterances in T20 where AS, a caregiver notes; *Sɛ ne yadeɛ no ba a, na yɛn nyinaa yɛn ho hyehye yɛn; obiara repɛ baabi akɔtɛ* 'If he has an attack, then we all feel trapped and we seek a hiding place (to be away from him.)'

## **5 Summary and conclusion**

From the cited texts and discussion, we observed that the linguistic strategies used to break and manage bad news include such phonetic features as silent and voiced pauses of various lengths, and reduplication or repetition which is employed to show frustration, anger, and emotional valence. Lexico-syntactic features used in bad news delivery and management included adjectives of quality used both attributively and predicatively, intensifiers (often repeated), verbs denoting physical sensation, and factive formulae (for evidentiality and credence). With respect to discourse-pragmatic features used to break and manage bad news, we identified vague reference forms, avoidance (which was done via giving up on words, or pronoun mismatch where a pronoun such as *you* was used to index another pronoun such as she in order to avoid direct reference. Others included the speech acts of complaining, blaming (which was done through the use of distal deictics and/or innuendo), requesting, blame-shifting (that is, blaming others including witches for causing the disease or pain), use of quantifiers, and code-switching from Akan to English given the fact that face-threatening acts

(FTAs) were perceived as being more tellable in a less familiar language or register (Movahedi 1996, Obeng 2008). Other discourse-pragmatic strategies used to break and manage bad news included polyvocality and inferencing through the use of such a non-specific pronoun as *biribi* 'something.'

Article 25 of the 1948 Universal Declaration of Human Rights mentions health as part of the right to an adequate standard of living. The United Nations General Assembly's (1966) International Covenant on Economic, Social and Cultural Rights also recognizes the right to health as a human right (United Nations General Assembly 1948). Given the above declarations and in view of the data examined for this study, it is true to argue that disease encroaches on both the negative and positive liberty of diseased individuals and their caregivers. Thus, disease exposes diseased individuals' and sometimes their caregivers' freedoms to encroachment and takes away the right of diseased individuals (and sometimes their caregivers) to participate and share in their personal and communal activities.

Furthermore, we learn from this study that managing bad news requires extreme care in determining what questions to ask and how to ask them, how to assign blame, how to assert and assure, among others, given that someone's life could be on the line. Given the socio-emotional, financial and cultural burdens that disease puts on diseased individuals and their care-givers, it is recommended that an opportunity be made available for such individuals to interact with others in view of the fact that such an interactional opportunity could be therapeutic and serve as a communicative means to convey and manage speakers' socioemotional and physical states (Kalmykova & Mergenthaler 1998).

It is argued further that bad news management has relevance for language and liberty. Specifically, through bad news management, liberty informs language and through language, liberty becomes a reality. Indeed, viewed from the point of view of liberty (Berlin 1960) we have demonstrated that via their complaints, requests, and other speech acts, the studied diseased individuals and their caregivers sought both negative and positive liberty. On the one hand, the diseased individuals requested the right to be free from intrusions from their diseases and from their care-givers' actions such as forced confinement (negative liberty) and also sought the right to participate in their private, family and professional (business) lives. On the other hand, the care-givers also sought the right to be free from attacks by diseased individuals and a reprieve or lessening of their care-giving burden as seen in AS' utterance in T24 in Excerpt 2 where she said, *Woba sei a, na anidasoɔ aba* 'When you visit like this, then there is hope (of help or reprieve).' What is unique about AS' utterance is that it is an implicit request

for help and KO's response, *Mɛhwɛ deɛ mɛtuni ayɛ* 'I'll see what I can do,' in that communicative context, was a promise to help.

What is also unique about this study is that even though Akan language ideology assumes that diseased individuals and people who are generally in need of help are not as communicatively powerful as their care-givers, in these recorded discourses, both the diseased individuals and their care-givers sometimes ignored the power asymmetry and sought both their positive and negative liberties. It is possible that both sides saw each other as being in it together and therefore ignored the power relations. Most importantly, we have learned from this study that the interdependence nature of the Akan society and the socio-cultural requirements placed on members of the community to assist each other in times of need places members at the same camera angle and that trumps the societal power asymmetry.

From the point of view of the larger Ghanaian society, it may be argued that to protect diseased individuals' negative liberty and positive liberty, there is the need to put in place rights that prevent the effects of 'disease' and people from encroaching on the freedoms of diseased individuals as well as rights that empower them to participate in their family and communal lives. Also, medical personnel (especially, doctors and nurses), social workers, end of life care-givers, and family members caring for their sick relatives as well as scholars working in the health area must understand the discursive mores surrounding bad news delivery and bad news management in order to be educated about the intertwining nature of language and care-giving and to guarantee the liberty of all stakeholders in the care-giving ecology.

## **Abbreviations**


## **References**

Bays, Alison M., Ruth A. Engelberg, Anthony L. Back, Dee W. Ford, Lois Downey, Sarah E. Shannon, Ardith Z. Doorenbos, Barbara Edlund, Phyllis Christianson, Richard W. Arnold, et al. 2014. Interprofessional communication skills training for serious illness: Evaluation of a small-group, simulated patient intervention. *Journal of Palliative Medicine* 17(2). 159–166.

Berlin, Isaiah. 1960. *Four essays on liberty*. Oxford: Oxford University Press.


#### Samuel Gyasi Obeng


# **Chapter 19**

# **Lessons from the field: An insight into the documentation of Gurenɛ oral genres**

## Samuel Awinkene Atintono

Accra College of Education

The paper discusses my eight months fieldwork experience of documenting endangered Gurenɛ (Mabia, Niger-Congo) oral genres which include riddles and folktales, sung folktales, songs and ritual performances between 2010 and 2012 in Bolga and Bongo in northern Ghana. It presents the documentary corpus of close to 100 hours of both audio and video recordings and discusses the strategies and challenges of documenting these genres. It is argued in this paper that though Gurenɛ with a speaker population of over 600,000 is not endangered, its oral genres such as riddles and folktales are vanishing and deserves attention to be documented. The paper draws attention of linguistic field workers and language documenters to pay attention to such languages and not to focus only on endangered or moribund languages. The documentation corpus from this project has been archived at the Endangered Languages Archive (ELAR) at SOAS, London. The lessons in this project can be used to document these genres in other Ghanaian or African languages for revitalization and preservation of these linguistic and cultural resources of language communities.

## **1 Introduction**

The paper provides some insights into the documentation of Gurenɛ oral genres based on eight months' fieldwork that I have undertaken between 2010 and 2012 in Bolga and Bongo in northern Ghana, West Africa. The corpus includes riddles and folktales, sung folktales, songs, daily traditional court trials and ritual performances.

Samuel Awinkene Atintono. 2022. Lessons from the field: An insight into the documentation of Gurenɛ oral genres. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 345–366. Berlin: Language Science Press. DOI: 10.5281/zenodo.6393803

A key contribution of the paper is the proposal to fieldworkers and language documenters not to focus only on documenting endangered and moribund languages but to pay attention to aspects of endangered linguistic resources of languages that are not classified as endangered. Ignoring these languages in our documentation agenda is a sure way to gradually lose some critical aspects of their cultural and linguistic resources of the speakers.

The paper also highlights some of the strategies used to document the Gurenɛ oral genres and the different types of genres documented. The corresponding number of hours of the recording of each genre is also provided. This is the first major documentation of this kind in the language. It provides a means for the revitalization and preservation of the linguistic and cultural resources of the language. Of late, some of the speakers especially the elderly and those living outside the homeland are engaged in various activities including the use of the social media (e.g whatsapp), music, youtube, the internet, churches, ethnic associations (e.g, BONABOTO, Terabuuriyele) as a means to preserve the language and present it beyond its immediate borders.

One goal of the documentation was to collect the disappearing oral genres especially the riddles and folktales, sung folktales, songs performed by women and other oral genres as many as possible. These genres are fast disappearing in the communities as a result of the massive impact of modern life, motivated by the desire to adopt western values and commodities. Since there are no records of these genres or opportunity for the younger generation to learn them, I saw this as an opportunity to document and archive them. Related to this goal was also to make available the audio and video recordings to the community radio stations and members to take advantage of the digital technology and learn them.

Another goal as is the case with many other documentation projects (cf. Trilsbeek & Wittenburg 2006, Austin 2003, 2006, Himmelmann 1998, 2006) is to ensure that the materials documented from the project are transcribed, annotated with metadata and deposited in a modern digital archive (e.g. ELAR) to provide a lasting record to prevent the loss of the genres.

The paper is organised as follows; §2 provides the language profile with section §3 on the case for the documentation of endangered genres. §4 discusses the fieldwork setting and the activities while §5 focuses on the documentary corpus involving the audio and video recordings with §6 providing a discussion on the strategies used in documenting the folktales. In §7, the challenges on the field are presented and §8 concludes the paper.

## **2 Language profile**

Gurenɛ is classified as Mabia (Gur) and belongs to the Niger Congo languages in Africa. It is spoken in northern Ghana, West Africa. It is sub-classified as a northwestern Mabia language with its closest relative being Dagaare and Moore spoken in Burkina Faso (Naden 1989, Bendor-Samuel & Hartell 1989, Bodomo 1994, 2004, 2020). It has about 600,000 speakers based on the Ghana Statistical Service Census Report (2010). This figure obviously is not accurate since the census did not take into account the language that the people speak but their ethnic membership. Of course, most of the ethnic members do speak the language and transmit it to their children. However, the problem with the ethnic membership criterion is that there is lack of clarity in the label. For example, in the 2010 Census, an old ethnic label Mole-Dagbani was used to refer to all the over twenty different linguistic groupings in northern Ghana of which Gurenɛ speakers are part. This label was used especially in cases where the speakers migrated to other communities in southern Ghana. So, people identified to belong to the Gurenɛ ethnic group were those in the homeland during the census. This has led to a misrepresentation of the actual number of speakers of the language and hence my reason for the mistrust of the figure.

Further, there is also the tendency for smaller ethnic groups within to identify themselves with their inner group for socio-political reasons and therefore missed being counted. For example, speakers from Bolga west, though are Gurenɛ speakers will usually identify themselves as belonging to Kasina-Nankani ethnic group (the speak Kasim or Nankani) and indeed were not counted as speakers although they belong to this language group.

Today, there is evidence from my recent fieldwork (Atintono 2013, 2020), and Bodomo's (2004, 2020) fieldwork to suggest that the number of speakers could be as high as 800,000. Other anecdotal evidence comes from the community based on district health census and livelihood intervention programmes data which support our estimate and suggest that the figure is higher than reported.

Gurenɛ is one of the five dialects of Farefari besides Boone, Nabt, Nankani and Taln (Kropp Dakubu 1995, Nsoh 1997, 2011, Atintono 2002, 2004, 2011, 2013, 2019). All the five dialects are mutually intelligible. It is important to point out that in the language classification literature there has been a misrepresentation of the language name leading to spellings such as Frafra, Gurenne, Gurune and Gureni. Farefari is a cover term representing all the five dialects. However, Gurenɛ is privilege to have a unified orthography as far back in 2001. Thus, Gurenɛ has been standardized and today has been considered as a language as a result of its current status of been codified. My position has been that it is a standard dialect

of Farefari as shown in my previous works (Atintono 2002, 2004, 2011, 2013). This is not new in linguistics as standard dialects emerged as languages due to their prestige status of been used in writing.

The language is also studied in the universities (e.g, University of Education, Winneba) and the colleges of education in Ghana but it is yet to be introduced at the early grade, primary, and secondary school levels as a subject of study. The College of Languages Education, Ajumako at the University of Education, Winneba is one main institutions in Ghana that trains both undergraduates and postgraduates in Gurenɛ. The present documentation project focused on Gurenɛ and the Boone dialects as spoken in Bolga and Bongo.

## **3 The case for documenting vanishing genres in non-endangered languages**

Recent interests in language documentation and description by linguistic fieldworkers is as a result of global concerns about language threat, endangerment and death (Hale et al. 1992, Himmelmann 1998, 2006, Grenoble & Whaley 1996, Crystal 2000, 2003, Woodbury 2003, Austin 2003, 2006, Gippert et al. 2006, Bowern 2015, Chelliah & Willem 2010, Essegbey et al. 2015). Consequently, a great deal of attention has been paid to language endangerment and documentation issues in the last few decades. Despite these efforts, there has not been a balance in actual practice in terms of the languages that have been documented across the continents and funding support for language documentation projects as far as the literature show.

The attention of fieldworkers and funding for documentation projects from organizations such as UNESCO, NSF, DOBES, ELDP so far are skewed towards documenting endangered languages in Australia and the Americas with a few on African languages (cf. Essegbey et al. 2015). There is also huge support for documentation projects that tend to focus on severely endangered or moribund languages. In this respect, a critical defining criterion is that languages with fewer speakers of about 1 to 100 who are older speakers but without younger speakers stand a good chance for support for documentation from endangered languages documentation funding agencies.

The reality, however, is that there are many languages in Africa with large numbers of speakers from a few thousand to hundreds of thousands with many aspects of their linguistic resources endangered. Such languages do not fit in the documentation agenda and are usually left out. The argument that is pushed in

the language documentation discourse is that such languages fit into the language preservation and revitalization projects but not language endangerment projects. I take a different position and propose that equal attention should be paid to such languages and be treated as endangered so long as there is sufficient justification to point out that some aspects of their linguistic resources are vanishing or dying in these languages.

Gurenɛ is one of such languages and cannot be classified as an endangered language with a speaker population of over 600,000 but with certain aspects of its linguistic and cultural resources such as riddles and folktales vanishing.

To put the discussion in context, it is important to state the defining criteria for endangered languages. A language is said to be endangered when it is at risk of disappearing within a generation or two with only elderly fluent speakers and no younger generation learning the language or are speakers Thomason (2015: 4). There are various degrees of language endangerment or vitality as depicted in the endangered languages literature. In particular, Moseley's (Moseley 2012) UNESCO Atlas of the world's languages in danger and UNESCO's (2011: 6) discussion on degrees of language endangerment in a document titled "Language Vitality and Endangerment" outlined six stages which are stated below:


Some language documentation experts such as Grenoble & Whaley (1996) also define language endangerment similar to UNESCO scale but not necessarily identical to include; At risk, Disappearing, Moribund, Nearly extinct, and Extinct.

On language vitality, UNESCO (2011: 5) identified nine criteria which include intergenerational transmission, community member's attitude towards their own language, shifts in domains of language use, governmental and institutional language attitudes and policies, including official status and use, type and quality of documentation, response to new domains and media, availability of materials for language education and literacy, proportion of speakers within the total population, and absolute number of speakers.

Based on these classifications of the stages of language endangerment, it is safe to conclude that Gurenɛ is not endangered because there is intergenerational transmission with both older and younger speakers. However, the language could be described as vulnerable because it is largely spoken at home and other social contexts such as market, daily interactions and at social events such as funerals, ethnic association meetings, and naming ceremonies.

In terms of vitality, out of the nine criteria, Gurenɛ will missed only one and that is no official status is accorded to the language by government especially to include the language in the school curriculum at the primary and secondary school levels. The people have sent many requests and petitions to the Ministry of Education to include the language in the official policy of the Ghanaian languages since 2002 but received no response.

Despite the fact that the language as a whole is not endangered, some aspects of its linguistic repertoires are critically endangered. They include riddles and folktales, sung folktales, songs, descriptive events and ritual performances. This is because these genres are no longer practised in the community for over three decades. Only a few elderly speakers (numbering about 5) in the community with the youngest been over 50 years and the oldest over 70 years by 2010 have knowledge in the narration of the riddles and folktales in particular.

The various social contexts in which the folktales were narrated no longer exist. For example, they used to be narrated in the evening by grand parents with children sitting around the fire in the evening (around 7.00pm) after dinner in front of the traditional compound. This was a source of entertainment as well as a means of inculcating moral lessons to children through the messages conveyed in the folktales. These genres are classified as oral literature comparable to western written literature which are used for the teaching of; expressing of thought, history, collective wisdom, language and literature.

The traditional home setting or context for the narration of these tales have been altered due to the adoption of western modes of entertainment. Thus, they have long since been replaced this traditional mode of entertainment with television sets and videos. This has further facilitated the endangerment and subsequent vanishing of the genres.

At the time of my documentation project between 2010 and 2011 as indicated in the preceding paragraph, only five (5) older men with ages between 50 and 70 years had expert knowledge in narrating the riddles and folktales. However, by February 2011 one of these five experts who mentored the other four experts had passed away. The contribution of the present work is the fact that but for this project, the community would have lost this genius narrator with his knowledge in this endangered genre. African oral genres such as riddles and folktales, sung folktales are far less familiar in western cultures and are less documented.

The Gurenɛ community like most other African communities, however, have no records of these genres for the younger generation to learn and practise them. This lack of records partly motivated the documentation of these genres to provide a lasting record in an archive and also with a keen interest in revitalising them.

The project also provided the opportunity for the recordings to be made available on CDs and DVDs for use by the community members and this is helping with the re-learning of these genres by the younger speakers. Quite recently, some of the riddles and folktales have also been shared on social media such as WhatsApp and YouTube for use by speakers and this is fast gaining grounds. The archived materials at the Endangered Languages (ELAR) is open access for most of the materials for both the community members and the research community. In Ghana, there are no well-established digital archives except a university's online library resource depositories which are often not open resources. Nonetheless, I have made copies of the recordings available to the Department of Gur-Gonja Languages Education at the University of Education, Winneba, where Gurenɛ is taught.

It is for the reasons discussed in the preceding endangerment paragraphs on the genres that I am proposing in this paper that though the language documentation agenda focuses on the endangerment of the entire language, it is also important that we pay attention to languages that may appear not to be at risk and yet have some of its important aspects critically endangered like the Gurenɛ's case. Indeed, the same can be argued for most of the Gur languages spoken in northern Ghana as they have large number of speakers with intergenerational transmission and yet have their oral genres disappearing (see Bodomo 2004, 2020).

### **4 The fieldwork setting and activities**

The fieldwork took place in six (6) communities in Bolga and Bongo between February 2010 and July 2011. The communities were typically rural except a few that were close to Bolga town such as Tanzui, Soe and Bukere where I recorded some of the folktales, funeral and ritual performances events. I spent eight (8) months on the field for the documentation. The first major field visit took place from February–June 2010. During this period, I recruited five documentation team members and one research assistant to assist in the recording on the field and transcription of the data. They were trained to acquire some basic skills to understand the project and the use of the audio and video recording equipment in language documentation.

I also identified five (5) experts in folktale narration and two (2) groups of sung folktale narrators in the communities with the help of the community members. We arranged with these teams and did a lot of the audio and video recordings in their communities either at a school compound or at the home of the performer. The arrangement to have the folktales performed in these two contexts are staged events as the natural or normal context for the performance used to be in the evenings during occasions such as newly married couple homes, home of a family head or when an important guest for the community visited and was passing the night. The folktale performances were for entertainment.

Together with my documentation team members, we had narrative sessions with each of the narrators at least once in a week for about six months even though some of the narrators frequently cancelled work appointments due to some emergencies. The communities were relatively far apart. Two of the narrators were from Soe and Bukere in Bolga while one narrator was from Yorogo (about 15km north of Bolga), and the most talented narrator who used to ply his trade in Bolga in the 1960s and 1980s had retired to his village at Kansuo (Namoo) of about 35km away from Bolga at the fringes of the Ghana and Burkina Faso border in the Bongo district. The sung folktale groups were both from Beo and Sapooro respectively located east of Bolga.

While the folktale narrators had between two (2) to seven (7) members in a team that of the sung folktale groups had between seven (7) to twenty (20) members in a group.

Apart from the five expert narrators' tales that we recorded, we came across by chance a blind narrator in Bongo who was identified by a community member. We had two recording sessions with him. Although, he had a good knowledge in folktales he did not have a team to support him like the other groups. He told us he learned the tales from his parents while growing up in the 1950s. He invited two of his friends to support him but they were not good respondents.

Other events that we recorded during the period include interviews with some elders on funeral performances and the rituals associated with burial. We also participated in chieftaincy installation events and traditional court proceedings at the Bongo palace. The paramount chief of Bongo (Naba Lem Yaarum) was very enthusiastic about our project and granted us interviews on the history of the Bongo chieftaincy and also the rituals associated with the installation processes.

Recording of women's songs was also another major activity that we under took on the field. We had a women's group from Sumbrongo, a town located at about ten kilometers west of Bolga who agreed to perform for us. We had four recording sessions with them. While on the field we took advantage to document daily conversations and other cultural events that were spontaneous in the communities whenever community members drew our attention to them.

I also conducted elicitation sessions with other consultants to collect data specific on the grammar and semantics of positional verbs for the writing of my PhD thesis which was part of the award of the ELDP grant.

The follow-up fieldwork was from May–June 2011. This last visit was mainly to cross-check gabs in the data with the consultants and do a few more elicitations and recordings of the folktales. The transcription of the recordings and the final preparation of the metadata constituted the main activities during this visit. It is important to point out that fieldworkers should ensure that the thin and thick metadata are properly recorded while they are on the field. But for the follow-up field trip, I had difficulty identifying names of some contributors, places the data was recorded and even dates and times. This is because we did not record some of the metadata information properly while we were on the field.

Unfortunately, on my second visit, the expert folktale narrator from Namoo (Azulemania) had died in February 2011. I had planned to meet him and record a few more tales and cross-check a few things with him but it was too late. The community members were, however, happy that my documentation record will help to preserve some of his folktales for future generations.

### **5 The audio and video corpus**

The documentation project produced a large volume of both audio and video recordings of a variety of genres. These are shown in Table 1 and Table 2 below. Most of this corpus has been transcribed and annotated using ELAN and are archived with metadata at ELAR, SOAS in 2012 and can be accessed at https: //www.elar.soas.ac.uk. Though this is an open access you will need permission for identification purposes from the archivist to gain access.

Notice that in Tables 1 and 2, there is a difference between the audio and video recordings in terms of the length of recording. There are a number of factors that account for this. First, it is much easier to do audio recording on the field than video. Further, as you might have observed, some of the genres like ritual genres are not on video because we were never permitted to use video to record.


Table 1: Audio recordings of the oral genres



Besides, the type of equipment you use to record will also contribute to how much data you can record. In my case, I used a Panasonic video camcorder, which could only record, on mini cassettes with a maximum of one hour. So, there were instances where we ran out of cassettes to record. But the digital audio recorder used SD cards and could record for longer hours of between three to five hours depending on the capacity of the card. This is the advantage of the audio recorder over the camcorder. Even if we had a camera that could record longer hours there was also the problem of the time of the day. Most of the recordings in the communities took place in late afternoon (from 4.00pm onwards) running late into the night. Per the tradition, they were narrated when it is nightfall after dinner. Thus, when it is nightfall it becomes difficult to do video recording without sufficient light. Also, the fact that my team and I were operating in rural settings made it difficult to have lights.

Apart from the audio and video recordings, I have also taken about one thousand (1,000) still pictures out of which two hundred and twenty-five (225) photos have been archived. The pictures depict different scenes of the folktale narration sessions, sung folktale performances, women song performances, cultural events such as funerals and chieftaincy installation events in Bongo and the elicitation sessions. Figures 1 to 6 below present examples of the scenes of the folktale narration sessions and other events. The arrangement is typically a semi-circle or a circle with the narrator and his team at the front row (Figure 1 below) while the audience sit behind or in front with space at the centre where occasionally the audience or the narrators may step in the space to dance to a folktale song.

### **6 Strategies for documenting the oral genres**

In this section, I discuss some of the strategies that were employed in documenting these genres. The first strategy involves consultations with the gate keepers in the community in order to gain access to both consultants and places for the documentation. They include chiefs, elders, and some local political leaders, in the case of Ghana, the Assembly member who represents the community at the district assembly. Being a native speaker or a member of the community does not necessarily grantee you easy access to people and places. I belong to the community but the fact that I had returned to document these genres meant I needed to obtain permission from the community leaders before I could start the project. The only advantage that I had as a native speaker was that the people were very receptive to me and in most cases I also knew the people to contact for particular information.

Figure 1: Folktale narration session at Kansuo (Namoo, Bongo) by Azulemania (seated 3rd from left first row in light blue shirt, arrowed) with his team members. At the extreme right row (1st row) monitoring the recordings are Samuel Atintono (the documenter in red and black T-shirt) and James Akolgo (white and green T-shirt), May 2010.

Figure 2: Riddles and Folktale narration session at Yorogo near Bolga by Apia (middle in singlet and arrowed) supported by his team members in the front row left and right. Picture taken in July 2010.

Figure 3: Adagesaana (extreme right) from Bolga Soe with his colleague Narrator Nsoh Atimbila from Bolga Bukere (second from right) at a narration session. Extreme left is Philip Anangina who has been my key documentation team member in Bolga and his friend Mba (my former student) who accompanied us. Picture taken in April 2010.

Figure 4: Sung Folktale performance session at Sapooro by Abugebiire (arrowed) and his team members. Picture taken in April 2010.

Figure 5: Sumbrongo women performing women songs with the documenter (Samuel Atintono) dancing. Leader of the group (Nsoma, arrowed). Picture taken in April 2010.

Figure 6: A traditional court trial on land dispute at the Bongo Chief's Palace with the chief seated on top right. The documenter, Samuel Atintono first from right in the first row. The arrowed are the two litigants. The rest are the chief elders (Picture taken in April 2010).

There is also the issue of making clear the goals of the project by the fieldworker to enlist the support of the community. In this respect, you must build trust between you and the community members. They are only willing to cooperate in the execution of the project when they have trust and are convinced about the goals of the project. Some members complained that previous researchers who came to record some data never returned to the community. However, they later saw their materials in publications or even in documentaries shown on national TV without their consent.

The fieldworker must also demonstrate transparency about the end result of the documentation products. The community members needed to know where the recorded folktales would end up. Some members complained about previous researchers not allowing them to have access to the recordings that they have taken from them. In my case, I made them to understand that after the documentation they would be given some of the audio and video recordings and I did give them the DVDs after the project. It is important to give community members some of the products created out of the documentation such as DVDs of audio and video recordings, simple literacy materials, wordlists or dictionaries. This way, they will appreciate and support the project.

One other strategy that was adopted in the fieldwork was to involve the help and support of the local people who have high interest in the documentation and preservation of the language and cultural resources. The five people that I recruited were all self-motivated to participate in the project and this also helped me to record a lot of genres. They could go on their own to document some events without my presence. When you involve active community members in your documentation project they may have privileges to access some community events that the fieldworker will not be allowed to participate and record. A typical example in my documentation was a situation where I wanted to document ritual genres associated with the burial rituals in Bolga and the pallbearers would not accept an uninitiated person to observe the rituals. Fortunately, one of my documentation team members had been initiated as a pallbearer so he had the privilege of participating and recording the rituals and also interviewing the expert pallbearers.

The promotion of the documentation project on local radio or TV stations in the community where it is possible can also help to whip up the community interest in the project. It will also create the opportunity among the community members to become aware of the project goals so that they can support it. It will be helpful to let the local team members do the talking about the need for the documentation project. I had a weekly programme of about 45 minutes to discuss my documentation project on a local radio station called Gurenɛ Style in

Bolga which ran its programmes using only Gurenɛ. The response from the community members was very impressive as it made people to discuss why families should speak to their children in Gurenɛ and suggested ways to revitalise and preserve the folktales. This proved to be very helpful as those who listened to the programme expressed their appreciation for the initiative and also pointed out to me some other experts knowledgeable in some of the genres for further contact.

Another crucial thing that I did was to play some of the folktales that I have transcribed on the radio stations and it generated a lot of interest. People were surprised to hear these dying genres being played on the radio. I left a lot of the audio recordings of the songs and the folktales with the radio station to play during our programme time after we had left. Other radio stations soon learnt about them and also collected them to play at their stations. The outcome of this, is that many people became more interested in the use of these folktales.

## **7 Challenges on the field**

As observed by Bowern (2015) every fieldwork situation has its own unique problems. There were a number of challenges that I encountered during the fieldwork which affected the progress of my work. They range from consultant's work schedules to equipment malfunctioning. I discuss each of these issues below.

Disappointments from my consultants with respect to meetings on time or postponement of meetings were a major factor. You may schedule a meeting at 1.00pm and they turn up at 2.00pm or later. In some instances, they may not even turn up. Some consultants are also very difficult to track for work. An example is one folktale narrator in Bolga that I had to follow him for six weeks before I was able to get him to start the narration of his tales. He would schedule an appointment for a meeting on a market day which comes every three days but whenever I met him he would give an excuse but will request that I buy him a local drink. He was a very slippery person but very knowledgeable in the riddles and folktale genre so I had no option but to continue to look for him on every Bolga market day to buy him the drinks until I finally had him in the sixth week to start the narration.

As a fieldworker, one needs to be patient with consultants and also you must adjust to the activities and time of your consultants to succeed in your fieldwork. Some other consultants had emergencies and could not honour appointments.

One other challenge that I had was the observance of cultural norms and social practices in the communities. While on the field, I noticed that whenever there was a funeral in any of the communities that I work, there will be no work for a few days. Particularly, if the funeral affects the consultant's family member or their neighbours you may be required to attend and express your condolences. You will also be required to make a small donation of cash or provide a local drink. The period of my documentation coincided with the performance of funerals in the communities and I frequently encountered these situations which made me postpone scheduled appointments.

I have had invitations very often from my consultants to attend other social events such as weddings, birthday or anniversary celebrations of churches or prominent community members. The fieldworker's physical presence is often very much appreciated and there was an expectation of making a donation in cash or kind. This could be time consuming and involved some cost. Even though these events do not directly relate to our documentation project, they help to maintain a good working relationship with the community. It is also important to observe these cultural norms and practices in the community to avoid potential breaking of these norms.

One other social event that was also a bit disruptive was friends' invitation to socialising events such as drink sessions at local pubs. I found myself busy doing some work on my documentation data and hardly had time to attend them but to my friends, I must honour them otherwise I will be labelled as showing off. This requires some negotiations to attend some and leave others else it will derail your work plan.

In the documentation literature, a lot has been said about how to record events and the type of equipment to take to the field for optimum results (Bowern 2015, Woodbury 2003, Austin & Grenoble 2007). It is important to be aware of the type of event that you can record and also the type of equipment that is appropriate to use. Recording of some cultural performances such as the traditional war dances which involved the performers running on the field as in Figure 7 below was very difficult, because it was hard to position the camera in a good angle to record because of the fast movement.

Also, the dancers sang along while performing the dance and this created excessive noise coupled with the noise made by the enthusiastic crowd who followed them. Thus, running to catch pace with them and record made the images unstable. The only time we had good images of the war dancers was when they performed at the front of the compound in a circular movement. Further, we have had instances where some of the audience deliberately shout or pass through the path of the camera lens just to be captured in the video.

Another crucial issue to note while on the field is to ensure that the recording equipment is constantly in good condition for an efficient workflow while

Figure 7: A group of war dancers (the leader is in front) from Bolga Soe at a funeral. Picture taken in May 2010

on the field (cf. Bowern 2015). Even though you might have charged your batteries and tested them the previous night before going to the field it is important to frequently monitor the equipment because the weather conditions can affect its performance. Our audio recorder with soft plastic case easily melted under the heat of the tropical sun and this sometimes led to the malfunctioning of the recorder. In northern Ghana, between April and June the weather usually gets hotter peaking between 30–40 degrees celsius. Thus, the audio recorders with metal or hard plastic casing are much better under such conditions.

One other important issue a fieldworker should consider is how to manage efficient power supply to the recording equipment during fieldwork. In some of the communities that I worked there was no electricity and in others there was electricity but it was very unstable and could go on and off for every 30 minutes. I used lithium and alkaline rechargeable batteries for the audio recorder. But we have had instances where the batteries depleted completely even though they were fully charged. You need to have sufficient rechargeable batteries for the digital audio and camcorders. You must remember to charge them over night. I found out that Lithium batteries were far better than Alkaline. In the tropics, heat can cause the batteries to discharge faster.

As you can see in Figure 8, the documenter and his team member are monitoring the battery level of the camera and to ensure that it was actually recording

Figure 8: Samuel Atintono (left, documenter) and James Akolgo (Right) monitoring the video camera during a folktale narration session at Namoo, Bongo. Picture taken in May 2010.

the event. It is important to ensure that you do not lose your recording as a result of the battery or the camera not working.

## **8 Conclusion**

The paper discussed my personal experience of documenting Gurenɛ oral genres in northern Ghana and notes its significant contribution in saving the disappearing genres such as riddles and folktales, sung folktales and songs. It is noted that though Gurenɛ may not be classified as an endangered language because of the existence of a large number of speakers and intergenerational transmission some aspects of its linguistic resources which include riddles and folktales are vanishing as there are only a few elderly people living today who have expertise in them and they die along with their knowledge.

The Gurenɛ documentary corpus contains many hours of both audio and video recordings of different types of oral genres e.g, riddles, folktales, songs, ritual texts, traditional court proceedings, descriptive texts and cultural performances. Most of these data have been transcribed, annotated and archived.

The paper also outlined a number of strategies that are required in order to be able to undertake a successful fieldwork which include community entry protocols, building trust among community members, ensuring transparency of the

#### Samuel Awinkene Atintono

project, community members' participation in the project, advocacy and rewarding them with the documentary products. Some challenges in the field that can significantly affect the progress of work in the field have also been discussed. They include delays in consultant's work schedules, the observance of cultural norms in the community, participation in social events, managing recording and equipment to ensure efficient workflow.

## **References**


Kropp Dakubu, Mary E. 1995. *A grammar of Gurune*. Legon: Language Centre.

Moseley, Christopher. 2012. *The UNESCO atlas of the world's languages in danger: Context and process*.


Trilsbeek, Paul & Peter Wittenburg. 2006. Archiving challenges. In Jost Gippert, Nikolaus P. Himmelmann & Ulrike Mosel (eds.), *Essentials of language documentation*, vol. 178, 311–335. De Gruyter Mouton.

UNESCO. 2011. *Atlas of the world's languages in danger*. Paris.

Woodbury, Anthony C. 2003. Defining documentary linguistics. *Language documentation and description* 1(1). 35–51. http://www.elpublishing.org/PID/006.

# **Chapter 20**

# **Dialogue with ancestors? Documentation data from Akie in Tanzania**

#### Karsten Legère<sup>a</sup> , Bernd Heine<sup>b</sup> & Christa König<sup>c</sup>

<sup>a</sup>University of Gothenburg <sup>b</sup>University of Cologne <sup>c</sup>Goethe University Frankfurt

The Akie language (*khúúti táa Akiyé* 'language of Akiye people') is a small Southern Nilotic language spoken in Central Tanzania (Manyara and Tanga Region) by approx. 300 people (among them 90 persons rated as language experts and guardians). Since 2009 this critically endangered language has been studied in a project funded initially by SIDA and since 2012 by the Volkswagen Foundation as part of the DoBeS initiative. The project focus has been on Akie documentation, mainly making audio and video recordings of a wide range of multifaceted speech events. In the recordings, a number of discourse markers (DMs) were identified. Of particular interest is the marker *hm* the role of which resembles an English sentence adverb like 'yes' or 'no'. Its use is quite limited, being mainly selected by late Lesakat and other elders. Its main function is establishing contact with an imaginary target group called *asííswe* 'ancestors'. The latter are said to be always present whenever something happens in the community. Their presence is in particular assumed during rituals such as blessing e.g. beer and weapons or wishing people a safe journey when travelling. Taking the presence of *asííswe* into account is a custom which is deeply rooted in Akie traditions and belief. Accordingly, the elders invite ancestors (often by name) to have a drink and some food, before they are requested to give way to the guests of the blessing. It is also a sort of appeasement, because ancestors can be harmful if they are not properly respected. In the blessing ceremony, the performing (fe)male elder answers on behalf of the *asííswe* with *mh* which is an imaginary confirmation of somebody's presence, similar to roll calls in English. In the paper, samples of Akie texts and the latter's English translation illustrate the linguistic component of the ceremony.

Karsten Legère, Bernd Heine & Christa König. 2022. Dialogue with ancestors? Documentation data from Akie in Tanzania. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 367–381. Berlin: Language Science Press. DOI: 10.5281/zenodo.6393805

## **1 The Akie community and their language**

This paper deals with a ceremony recorded several times on video and sound among the Akie in Tanzania. The Akie community is a small marginalized huntergatherer ethnic group that lives mainly in remote settlements of the Manyara (Kiteto and Simanjiro Districts) and Tanga Regions (mainly Kilindi District, a few villages also in Handeni District). The Akie language is included in *Ethnologue* (Simons & Fennig 2018) as an independent entry, based on information collected by the authors until 2016. Simons & Fennig classify Akie (*khúúti táa Akié* 'language of the Akie people') as Southern Nilotic. The language is spoken by approximately 300 people. Of these, 90 individuals are rated as language experts and guardians, while the remaining 200 speakers have average competence in the language. Another one hundred people have a limited competence.<sup>1</sup> Further language-relevant details listed by Simons & Fennig for Akie are as follows, but note that the language use information and the number of speakers have been updated above:


<sup>1</sup>This data is the result of field work which was conducted in nearly 60 villages or settlements where the names of Akie speakers or people of Akie origin (identified as Il-Tórobo by Maasai neighbors) were recorded.

<sup>2</sup>This is a modification of the Expanded Graded Intergenerational Disruption Scale (EGIDS) classification cited in Simons & Fennig (2018: 7), namely, to move Akie from Level 7 (Shifting) to Levels 8a–9 (Moribund–Dormant) instead. This is because, where they live, the few fluent users of Akie in Tanzania are older than child-bearing age, so it is too late to restore natural intergenerational transmission through the home. A mechanism outside the home would need to be developed to achieve this.

Figure 1 shows the spread of the Akie language.

Figure 1: Distribution of the Akie language.

## **2 The Akie documentation project**

The texts presented below form an integral part of the work on documenting the Akie language. Initially (from 2009 to 2011) the Akie documentation took place within the framework of the Languages of Tanzania (LoT) Project (Gothenburg University-University of Dar es Salaam, Department of Foreign Languages and Linguistics cooperation; Gothenburg coordinator K. Legère) funded by the Swedish International Development Cooperation Agency (SIDA). Subsequently in 2012, the Volkswagen Foundation assumed funding of the Akie documentation as part of the Documentation of Endangered Languages (*Dokumentation Bedrohter Sprachen*, DoBeS) initiative, by way of two separate studies, namely "Akie in Tanzania: Documenting a critically endangered language" (2012 to 2015) and "Akie in Tanzania: Updating the documentation of a critically endangered language" (2017 to 2019). The fieldwork results, which are derived from the aforementioned 60 settlements in the Manyara and Tanga Regions, culminated in a large collection of sound, video and text files as well as photographs, which have been deposited with the DoBeS Archive at the Max Planck Institute of Psycholinguistics in Nijmegen, The Netherlands. The DoBeS Archive forms part of

the United Nations Educational, Scientific and Cultural Organisation (UNESCO) Memory of the World (MoW) Programme.<sup>3</sup>

According to Himmelmann (2006: 1), "a language documentation is a lasting, multipurpose record of a language", reflected in a wide range of multifaceted communicative events. Such documentation aims at providing "a comprehensive record of the linguistic practices characteristic of a given speech community" (Himmelmann 1998: 166). The language documentation target is the collection of raw/primary data in recording sessions (both audio and visual) that encompass "as many and as broad a range as possible of communicative events" (Himmelmann 2006: 7) . It goes without saying that recordings as suggested here are the background of the Akie project against which the subsequent transcription, translation and linguistic analysis of the material take place.

The documentation approach is rather well-suited to making available a variety of linguistic elements from recorded communicative events, as e.g. narratives or interviews. In respect of the discourse marker (DM) discussed below, it is taken for granted that its elicitation would probably not be easy or even impossible, as any DM use is associated with the flow of discourse such as the spontaneous utterances during e.g. a conversation or a narrative. An interesting example of this kind of communicative event can be found in the recordings in Napilukunya, Ngababa and Losekito (Akie name of Gitu which is one of the main Akie settlements north of Kibirashi, see Figure 1) during blessing/offering rituals that focus on ancestors. Such a ceremony is referred to as *kanaítaísɛ* in Akie (*tambiko*<sup>4</sup> in Swahili).

## **3 Ancestral cult**

#### **3.1 The socio-anthropological perspective**

The audio and video material and its transcription in §3.2 provide an example of how the Akie, through their relationship to their ancestors, pay special attention to a specific social element as an essential factor of their identity. For the community as well as the individual, respect for the deceased and the maintenance of the associated tradition of involving the ancestors in essential social spheres play an important role.

From a socio-anthropological point of view, Ferraro (2005: 364) comments on the position of ancestors in the form of ancestor worship as follows:

<sup>3</sup>https://archive.mpi.nl/islandora/object/lat

<sup>4</sup>Described by Johnson (1939: 449) as an "offering of oxen, … beer … made to propitiate the spirits of the dead".


In addition, Ferraro (2005: 365) points out:

• Ancestors punish individuals who violate behavioral norms.

This general overview also applies to the Akie community in up-country Tanzania, where the contact with the ancestors is being maintained. Aspects of the ancestral cult among the Akie is dealt with in the following subsection.

#### **3.2 The Akie community and ancestral worship**

Blessing/offering rituals called *kanaítaísɛ* are an indispensable element of Akie tradition and culture. Such ceremonies are deeply rooted in the belief that the community must continuously be assured of their social values and harmony with their neighbors, nature, etc. During these ceremonies, *Tororeita* ('God') and especially *asííswe*<sup>5</sup> ('ancestors') are asked to bring peace, well-being, health, enough food and local beer to the community, while anger, intolerance and other offensive habits are rejected. Local beer is an important component of the ceremony itself and is also drunk directly afterwards. The ceremony is also performed before men go hunting, when someone is expected to travel, and when guests come to visit. In the latter case, the ceremony is held for visitors to familiarize them with the way the Akie community cultivates and respects traditional values.

Akie community members who were consulted emphasized that their ancestors play a substantial role in these ceremonies. They are said to be present whenever something significant happens, and they are always hungry and thirsty. There is also the belief that *asííswe* turn into monsters if proper care is not taken

<sup>5</sup> Singular *asííswante*.

of them. Thus, the main function of the ceremony is for the community to establish contact with this imaginary ancestral target group.

In a videotaped conversation on 2016-02-25 in Mbeli<sup>6</sup> with his fellow elder Nkoiseyyo, Lesakat (L), who was one of the pillars of Akie culture and tradition, summarized the role of *kanaítaísɛ* as follows:

(2) L.: - ... nan korio nen kanaitaisee, inaitaise phi chaa eech, duo phaai. '... and now at the offering/blessing ritual, adults, in particular elders, make it.' - Kae ko khuumii, kiale.

'There is beer, it is bought.'

	- Ko chichee kiamchin lokhoo, kiinaitaise. 'And, we tell them about everything new when our ritual is made.'

The ceremony usually takes place after sunset, i.e. at night. As mentioned above an obligatory prerequisite is beer (*khúúmi*) in a container, such as a steel/ aluminum drum or an old plastic Sadoline paint bucket. In the case of the ceremony in Napilukunya and Ngababa, community members brew beer from honey. As a rule, the ceremony is performed by at least one Akie elder (male or female) or someone familiar with the blessing/offering tradition. At the beginning of the ritual and thereafter at regular intervals, the performer dips *ratiɲántɛ*<sup>7</sup> (the fruit of the sausage tree, *Kigelia africana*) in the beer, sucks in the liquid and then spits it out. After the ceremonial monologue all those who attend the ritual spend the

<sup>6</sup> See the DoBeS/Akie video deposit: 2016\_02-25\_Mahojiano\_Mbeli\_Lesakat\_munimuni.mpeg, https://hdl.handle.net/1839/b41cca45-3c9b-4d78-b45f-7e650850cac4.

<sup>7</sup>The Akie tree name is *sangaratwe*. In Swahili the tree is *mwegea* and its fruit *yegea*.

evening together sharing food and drinks. The structure of the rituals and performance as well as the linguistic focus of the communicative event documentation will be dealt with next.

## **4 The ancestral cult among the Akie and their linguistic constituents**

Below are transcripts and idiomatic translations of three ceremonial discourses in Akie that were recorded audiovisually. A prominent feature of the texts is the respectful, polite way of addressing the ancestors.

In the first text<sup>8</sup> L as the performer introduces the guests from far away<sup>9</sup> (called "white giants") to the *asííswe*. Similarly, L welcomes the ancestors, inviting them to eat and to drink *khúúmi*. They are asked to bless the community and clear the way for the ceremony, the participants, and others. The ancestors are finally expected to leave in peace.

The polite, respectful way of speaking to the imaginary target group of *asííswe* is demonstrated in parts 1, 3 and 5 of text 1. When the recordings of this and other ceremonial monologues and their transcriptions were analyzed, it turned out that a specific linguistic element occupied a prominent position in this communicative event. This was the discourse marker (DM) *hm* which, on the one hand, was traced in these parts as paying attention to the ancestors in attendance in general. On the other hand, in parts 2 and 4, several *asííswe* were addressed by name. Thus, the text culminated in a sort of roll call, where, as a token of confirming *asííswe's* presence, attention and acceptance, a mock "dialogue" was generated by L. In so doing, for those ancestors whose names were mentioned L imitated a fictitious *hm* answer after the call. In this case, in an English roll call the DM *hm* would be the equivalent of 'yes' (categorized as an answer particle), 'here' or 'present'.

In general, at the offering/blessing ritual, the use of DM *hm* is mainly associated with the English affirmative sentence adverbial 'yes'. When this lexical function of DM *hm* was discussed with Akie resource persons such as L and Bahati Nguyaki (BN), both confirmed this interpretation.<sup>10</sup>

<sup>8</sup>The focus of text 1 and the following texts is on the presentation of a logical flow of words that is not interrupted by glossing. In response to a suggestion by one of the manuscript reviewers a short selection of glossed text is included in the appendix which is expected to demonstrate some elementary facts about Akie language structure.

<sup>9</sup>Karsten Legère (KL), P. Mkwan'hembo (MK), L. Ole-Wanga.

<sup>10</sup>With respect to DMs including *hm* see also Heine et al. (2017) and König et al. (2015: 137–146).

	- a. Ɔ́pwan nái, ɔɛɛ́ sien nen ɪʊ. ́ 'Come now, drink here.'
	- b. Ɔɛɛ́ sien nái, s'ɔítɛ kɛɛn, ɔ ́ ́ɔ́pɛ. 'Drink now, so that you may move, leave.'
	- c. Ɔ́pwan, ɔɛɛ́ sien nen ɪʊ, asííswɛ ́ , amʊ akílɛ ɗe: ́ 'Come, drink here, spirits [ancestors], because we have just said:'
	- d. Kápwa tián chʊ lɛɛlách ii. Hm. 'White giants have come. Hm.'
	- e. Kɔ akwɛ́i, kikúúrɛ amʊ ii. 'It is you who have been called for this.'
	- f. Kɛɛ́ pwa, kɛ ́ ɛ́ le: ́ 'They came to say:'
	- g. ɔmwáun ɗe lɔkoíywɛ tukul. Hm. 'Just tell us all words. Hm.'
	- h. Kɔ́pwa kɔ́riekis. Hm. 'They came to brew beer. Hm.'
	- i. Koúp, anasínan, amíítwaaki tukuul. Hm. 'Folks, they brought in all the food. Hm.'
	- j. Ɔ́ɛɛ́ sien nái, si ɔŋɛɛtítɛ, ɔ ́ ́ɔ́pɛ, hm, si ɔkóónu kurúɽta. Hm. 'Drink now so you may get out and go, hm, so as to give us health. Hm.'
	- k. Ɔítɛ kɛɛn akwɛ, asííswɛ ́ , hm, ɪkaa laŋatáá ni. Hm. ́ 'Move yourselves out, ancestors, hm, of tonight. Hm.'
	- l. Ɔmáátɛɛn asíísi. Hm. 'Follow this sun. Hm.'
	- m. Amu kɔ akwɛɛ́ , kakípwan kɛ ́ ɛ́ saam kuutii. Hm. ́ 'Because it is you whom we come to ask for the language. Hm.'
	- n. Akɛɛ́ saam kíí ya kimáchɛ. Hm. ́ 'We ask for what we want. Hm.'
	- o. Kɔ́wa kɔ́puɽ íyya kɛnkɛɛni. Hm. 'It became stalled in one place. Hm.'

Apart from its role at the blessing/offering ceremony which is commented upon in this paper *hm* is also used in narratives, where its function is the confirmation of the speaker's wording and text content. Like *ehee* in Swahili, DM *hm* here encourages the narrator to go ahead. In this case, Heine et al. (2017: 152) classify *hm* as a connective DM expressing discourse continuity and introducing a new information unit, translated as 'and then'.


'We do not want just now, hm, because we are not just people who are from the open place. Hm.'


Ɔ́pwan nái, pháápha, hm, pháápha Sapáí, hm, Lɔitiakí, hm, Kaíyakaí, hm, Namaiyai, hm.

'Come now, Father, hm, Father Sapai, hm, Loitiaki, hm, Kaiyakai, hm, Namaiyai, hm.'

(5) Part 3

Ɔ́pwan nái, ɔpasúún kɛɛn tukuul, hm, s'ɔítɛ kɛɛn, hm, si ɔkoonɛch ɗe ɲɛ ́ ɛ́ ́ kurúɽta. Hm.

'Come now, gather yourselves all, hm, so that you may move, hm, so that you can give us health now. Hm.'

	- a. Amóó, hm, Moisári, hm, Naiyasoi, hm. 'Mother, hm, Moisari, hm, Naiyasoi, hm.'
	- b. Ɔ́pwan nái, hm, Lɔlóísya, hm, Tóótɔ, hm, Lɛmɔ́na, hm, Ngusɛrɔ́ɔ, hm. 'Come now, hm, Loloisya, hm, Tooto, hm, Lemona, hm, Nguseroo, hm.'
	- a. Ɔpasúún nái kɛɛn tukuul, hm, si ɔítɛ kɛɛn ɔ́ɔ́kwɛ, hm, si ɔkóónu kurúɽta, hm, si ɔítɛ kɛɛn, si ɔkaachí kɔ́ɔ́ɲɛ chʊʊ, kolasiya, hm. 'Gather now yourselves all, hm, so that you can move out, hm, so that you may give health, hm, so that you can leave, so that you may give me my eyes, they will heal, hm.'
	- b. Leláa kulɔ, s'áási ɗe ɲɛɛ́ ́íya apʊ́nɛ, na alapátii, hm am(ʊ) kakílɛ ɗe, arkoyáii, s'áási ɗe ɲɛɛ́ ́am kakílɛ ɗe, arkoyáii ólta, arkóe na kiparé phii, hm.

'You folks, so that I find a way to pass and run, hm, because it was just said, the words are broken, there are people who are being killed, hm.'

The structure of the 2010 *kanaítaísɛ* recording (text 2 below) is similar to that of text 1 above. In this ceremony, first a female and then a male elder take part. In part 1, N Muringa (M) explains that guests who want to know more about the Ngababa Akie community and the Akie language have arrived in Ngababa. Accordingly, the Akie community members in this village organized various events for their visitors, including a blessing/offering ritual which took place on 2010- 08-04. In text 2, part 1, M produces *hm* three times. In text 2, part 2, L follows the pattern that he had demonstrated in his 2009 performance above (text 1).

	- a. Iki yai khuumi, phaapha, akipwaan nai iyaa iinte phaapha Teeye ai Patina.

'This beer now, Father, we have just come where you are, Father Teeye and Patina.'

b. Ichi khuumi opwaan, oeesien. Hm. Ko phii chaa ko tayee chu chaa kapwa, kolei:

'These beers, come and drink them, hm. There are these guests who have come, saying:'


#### 20 Dialogue with ancestors? Documentation data from Akie in Tanzania

The part 1 text continues, but is not included here, because no DM hm is used. The same applies to the introductory section of text 2, part 2, which is skipped, for no *hm* is applied.

	- a. ... Ko akwe; kikuure.
		- '... It is you; you are called.'
	- b. Opwan nai de duoi. 'Just come now.'
	- c. Ophasuun keen, asiiswe, hm. 'Assemble yourselves, ancestors, hm.'
	- d. Phaapha, hm, Saphai, hm, kos'oopwan nai de duo. 'Father, hm, Saphai, hm, so that you (all) just come.'
	- e. Ophasuun keen, hm. 'Assemble yourselves, hm.'
	- f. Loitiaki Kaiyakai, Oltungu, opwan nai, ophasuun keen tukuul. 'Father, hm, Saphai, hm, so that you (all) just come.'
	- g. Kaiyakai, opwan nai, hm, nen iyu. Hm. 'Assemble yourselves, hm. Hm.'
	- h. Taiko, hm, Kisema, hm, Alapuopuo, hm, Nguseroo, hm. 'Taiko, hm, Kisema, hm, Alapuopuo, hm, Nguseroo, hm.'

Altogether, L produced the DM *hm* at least 21 times. It occurred 10 times in what could be interpreted as a roll call of the ancestors. When L addressed the ancestors in a more general way (not calling them by their names), the DM was used another 11 times. Another blessing/offering ritual was recorded on 2013-01- 31 by Christa König and Bernd Heine in Losikito (Gitu). Text 3 below is an extract from their transcript of a ceremony performed by Nkauli Samakuya (NS). In NS's speech, the DM hm is even more widely used (namely a total of 58 times) than in texts 1 or 2 above. Here are samples of this 2013 material.

	- a. Óeesien, hm, ásiiswé ikáá kíyaí pa Mókiri, hm. [1/61] 'Drink, hm, ancestors of this country of Mokiri, hm.'
	- b. Keleí: Hm, óyumuyumun náá kɛn, hm, áko Kɪkɛkɔ, hm. [1/65] ́ 'We say: Hm, gather yourselves, hm, Kikeko people, hm.'

The texts produced by even experienced Akie speakers such as Papalai Kalisya on 2013-10-30 (in a short blessing/offering ceremony of two minutes) and earlier B. Nkoiseyyo together with Nkúyáki Rokina in Losekito (Gitu) on 2013-01-31 (more elaborate and longer, but very general) are short of the mock dialogue with the community's ancestors. In this respect, the names of the ancestors are no longer quoted, probably because they may not be known any more. Consequently, the emphasis of other more recent blessing ceremonies witnessed during fieldwork has now shifted to the prevailing situation and its hardships for the Akie.

## **5 To sum up**

This paper has dealt with some aspects of the documentation of the Akie language using the example of ceremonial discourse as part of the ancestral cult. In this respect the function of the DM *hm* was discussed which, in the texts included here, is associated with blessing/offering rituals that have been recorded since 2009 in Akie community settlements.

The text samples selected above demonstrate how the ceremonies emphasize maintaining and consolidating a ritual relationship with the community's ancestors. The respect bestowed on deceased members of the community and – from the perspective of the living – the unilateral contact with them forms a substantial pillar of Akie traditions and heritage.

The discourse strategy in *kanaítaísɛ* employs two principal ways of addressing ancestors, who are expected to take note of what they are being told. The first method is more general, speaking to them without any name, while the second entails appealing to them directly by name. The ancestors are deemed to be responding to being addressed in these ways. For example, the performer of the ceremony utters on behalf of the *asííswe* a short mock response by way of the DM *hm*. When the ancestors are appealed to directly by name in a type of roll call, the mock response *hm* corresponds with 'here', 'yes' or 'okay' in English, and

indicates that somebody is present. In terms of the ceremony "dialogue" content per se, not only are the ancestors invited to *kanaítaísɛ* to drink beer and to eat, they are also requested to pave the way for the arrival of guests or latecomers, ease the passage for community members that are leaving, and bestow peace and well-being on the Akie people.

The material presented here is a symbol of the Akie language versatility that however, is becoming more and more rare. Both N Lesakat and N Muringa, who were known and respected as guardians of the language (displaying sophisticated language use, rich in its vocabulary and grammatical structures), Akie history, traditions and other facets of Akie life have passed away.

Another aspect of continuing to uphold Akie traditions and heritage has to do with maintaining *kanaítaísɛ* in view of the small number of Akie-speaking persons who may be exposed to this and other communicative events. Given that many Akie people live scattered and isolated in small settlements far from each other, it is often not possible to gather enough participants to enact any ritual. That's why ceremonies can only be practiced in a few places such as Losekito and Napilukunya, where a reasonable number of Akie speakers can still be found.

In conclusion, therefore, the future of the Akie blessing/offering ritual in the traditional manner – or even in a modern version which was referred to above – is bleak. Taking this situation into account, the assessment of ongoing language change, attrition and loss as portrayed in this paper with respect to *kanaítaísɛ* as one of the pillars of the hunter-gatherer Akie community serves to corroborate once again Akie's status as a critically endangered Tanzanian language.

### **Abbreviations**


## **Acknowledgments**

The authors hereby express their gratitude to the Volkswagen Foundation, which has generously supported and financed the extensive activities of two documentation projects since 2012. Similarly, the SIDA support for KL's field work in

2009/2010 is acknowledged with thanks. In this regard, the University of Dar es Salaam has authorized KL's Endangered Languages in Tanzania research project (as part of LoT cooperation) and granted permission for the field research under the Akie (Ref. No.: AB3/3(B)). There was also exemplary support from regional authorities (Tanga and Manyara). At the grassroots level the cooperation with Akie language experts, speakers and other Akie community members was excellent. The commitment shown by the local resource persons and Tanzanian assistants, as well as their initiative have left an indelible impression. A pre-final version of this paper was checked language-wise by Ms. Sandra Fitchat (Swakopmund, Namibia). Her language revision and paper comments were substantial and stimulating. However, the views expressed in this paper are entirely those of the authors. Ahsanteni sana – many thanks to all.

## **Appendix**

	- b. Akiye Akiye.a ká gen mókiri Mokiri.a *hḿ* hm ǹte exist ɗé dm iyu here 'Akiye of Mokiri, *hm*, exist here.' (1/62)
	- c. *hḿ* hm kɔ cop nɪnyɛ 3.sg.a ɗé dm ni rel.sg kíí 1.pl ntei-yak be.together nen prep iyû here nkúyaki Nkuyaki *hm* hm '*Hm*, it is him who we were all together with here in Nkuyaki, *hm*.' (1/63)
	- d. inkúyaki Inkuyaki ɗe dm ko cop inkúyaki Inkuyaki.a ɗe dm ni d.pr.sg *hḿ* hm 'It is Inkuyaki there, *hm*.' (1/64)
	- e. ke 1.pl leí say *hḿ* hm ó 2.pl yumuyumu-n gather.s-ven-imp náá rel.sg kɛn rfl *hḿ* hm áko coll.a kɪkɛkɔ́ Kikeko *hḿ* hm 'We say: *Hm*. You who are Kikeko folks, *hm*, should gather all together.' (1/65)

f. ɛlá folks kɔlɔ excl *hḿ* hm iciɗeí here kaai town.a táá gen kisíko Kisiko.a *hḿ* hm kai town.a táá gen kisíko Kisiko.a *hḿ* hm 'Folks, *hm*, here the town of Kisiko, *hm*.' (x2) (1/66)

# **References**


# **Chapter 21**

# **A phylogenetic classification of Luyia language varieties**

Michael R. Marlo<sup>a</sup> , Rebecca Grollemund<sup>a</sup> , Thanh Nguyen<sup>a</sup> , Erik Platner<sup>a</sup> , Sarah Pribe<sup>b</sup> & Alexa Thein<sup>c</sup>

<sup>a</sup>University of Missouri <sup>b</sup>Ohio State University <sup>c</sup>Washington University in St. Louis

This paper presents the results of a comparative study of the Luyia cluster of Bantu languages spoken in western Kenya and eastern Uganda. We propose a new classification of Luyia and neighboring languages using phylogenetic methods. Our study is based on a 200-item wordlist of basic vocabulary, representing 33 language varieties from the Luyia cluster and its closest neighbors, including Ganda, Gwere, and Soga to the west, and Gusii and Kuria to the south. Our results are broadly consistent with past classifications by Mould (1976, 1981) and Williams (1973), but refine our understanding of the relatedness of the target languages by employing more extensive data from more languages within the Luyia cluster and others in the region.

## **1 Introduction**

The Luyia language cluster consists of around 20 Bantu language varieties spoken in western Kenya and eastern Uganda. A map with several Luyia varieties and neighboring Bantu languages is shown in Figure 1. The languages that we refer to as part of the Luyia cluster are circled. This includes the Kenyan language varieties spoken by members of "Luyia" or "Luhya" ethnic communities that were politically united in the first half of the 20th century (see MacArthur 2016) as well as closely related linguistic varieties on the Ugandan side of the border that were not part of the ethnopolitical unification that took place in Kenya.

Michael R. Marlo et al. 2022. A phylogenetic classification of Luyia language varieties. In Galen Sibanda, Deo Ngonyani, Jonathan Choti & Ann Biersteker (eds.), *Descriptive and theoretical approaches to African linguistics: Selected papers from the 49th Annual Conference on African Linguistics*, 383–407. Berlin: Language Science Press. DOI: 10. 5281/zenodo.6393807

Michael R. Marlo et al.

A second map of Kenyan Luyia varieties from Heine & Möhlig (1980: 35) is given in Figure 2.

Figure 1: Map of the Luyia language cluster and its nearest neighboring Bantu languages

The overarching goals of this project are to understand the relationship between the languages of the Luyia cluster and their closest neighbors and to understand the internal structure of the Luyia cluster. To achieve this goal, we propose a new classification of Luyia and its nearest neighbors using phylogenetic methods.

## **2 Prior classifications of Luyia**

In this section, we provide a detailed overview of prior classifications of Luyia. These include geolinguistic classifications in §2.1, genetic classifications in §2.2, a "dialectometrical" classification in §2.3, and the referential classification found in *Glottolog* in §2.4. We conclude the section by identifying some lingering questions about specific varieties of Luyia mentioned in prior classifications in §2.5.

#### **2.1 Geolinguistic classifications (Guthrie 1948, 1967, Maho 2009, Lewis et al. 2016)**

The first geolinguistic classification of Luyia was established by Guthrie (1948, 1967) and updated by linguists in Tervuren. These results, which were largely adopted by the *Ethnologue* (Lewis et al. 2016), are presented in an accessible way

Figure 2: Map of Kenyan Luyia varieties (Heine & Möhlig 1980: 35)

in Maho (2009). In these classifications, most language varieties of the Luyia cluster fall under JE30, the Masaaba-Luyia Group, shown in Table 1. Within this classification, a large number of central Kenyan Luyia varieties fall within JE32, the so-called "Lu(h)yia cluster".

Many broad aspects of this classification are uncontroversial – for instance, the fact that the language varieties of JE30 should be grouped together. One aspect of the Guthrie/Maho classification that is controversial is the placement of a few language varieties that we consider part of the Luyia cluster outside of the JE30 decimal series. As shown in Table 2, the southeastern varieties Isukha (JE412), Itakho (JE411), Logooli (JE41), and Tiriki (JE413) are in JE40, the Logooli-Kuria Group, along with Gusii (JE42) and Kuria (JE43), and several other languages of the Mara region of northwest Tanzania. Additionally, as shown in Table 3, Maho (2009) places Nyala West in JE18, part of JE10 along with Soga and Ganda.

Nyala West is sometimes referred to as "West Nyala" and Nyala East is sometimes referred to as "East Nyala" or "Nyala K", referring to Kakamega – the name of the county where Nyala East is spoken. As shown in Figure 2, Nyala East is surrounded by several other Luyia varieties: Bukusu, Tachoni, Kabarasi, Wanga, and Tsootso. Nyala West is spoken in Busia County adjacent to the southwestern variety Saamia.

Michael R. Marlo et al.


Table 1: JE30: Masaaba-Luyia Group (Maho 2009)

In 2007, Luyia was introduced as a "macrolanguage" in the *Ethnologue*, and many of the Luyia varieties were recognized with distinct ISO codes. As part of this change, Nyala East was renamed "Olunyala" (i.e. "Nyala" without the cl. 11 noun class prefix), and although Nyala West was not mentioned in ISO 639-3 Change Request Number 2007-171, Nyala West was also included as part of Nyala in the changes implemented in the ISO 639-3 reclassification. Nyala West and Nyala East were thus unified as part of JE32 in the 16th edition of *Ethnologue*, but this seems to have been an accident resulting from the shared language name and the lack of mention of Nyala West in the change request. A subsequent ISO change request (2014-001) to reintroduce Nyala East and Nyala West as separate language varieties with distinct ISO codes was rejected, citing a lack of linguistic evidence. (The authors of the change request failed to cite Heine & Möhlig (1980),


Table 2: JE40: Logooli-Kuria Group (Maho 2009)

which includes Nyala East in the northern-central Bukusu-Wanga cluster and Nyala West in the southwestern Saamia-Nyala cluster.) This decision failed to recognize that the merger of these two languages 7 years earlier had also been done without any linguistic evidence or even a specific request to merge the two language varieties under one name.

#### **2.2 Genetic classifications (Mould 1976, 1981, Williams 1973, Nurse & Philippson 1980)**

The first genetic classification of Luyia language varieties was done by Williams (1973), who presents a relatively extensive internal classification of Luyia with data from 16 Luyia varieties. Williams (1973) primarily uses lexicostatistic methods and the 200-item Swadesh list.<sup>1</sup> Lexicostatistics – the method developed

<sup>1</sup>Williams (1973) also identifies some phonological correspondences across varieties and compares the noun class prefixes across Luyia varieties.

Michael R. Marlo et al.


Table 3: JE10: Nyoro-Ganda Group (Maho 2009)

by Swadesh (1952) – measures the percentages of shared cognates by comparing the similarity of words from the basic vocabulary of Swadesh between two or more related languages. Williams' (1973) approach yields a geographicallybased clustering of varieties, shown in Figure 3, which has a flat structure with five branches: Western, Northern, Central, Eastern, and Southeastern. Note that in contrast with the Guthrie/Maho system, the southeastern varieties Isukha, Itakho, Logooli, and Tiriki, and the southwestern variety Nyala West are treated as part of Luyia in William's (1973) classification.

Figure 3: Williams' (1973) classification of Luyia

Mould's (1976, 1981) classifications generally accord with Williams (1973), though Mould worked with different languages. Using a 200-item wordlist, Mould (1976, 1981) carried out a lexicostatistic analysis including five Luyia varieties (Bukusu, Itakho, Logooli, Saamia, and Wanga) along with Ganda and Soga to the west and Gusii to the south. Mould's (1976, 1981) results, summarized in Figure 4, show the overall unity of Luyia, as the varieties we consider part of the Luyia cluster are more similar to one another than they are to Ganda, Soga, and Gusii. Internally within Luyia, the Southeastern varieties Itakho and Logoori branch off from Saamia, Wanga, and Bukusu, which is divided into a Northern branch with Bukusu and a Western-Central branch with Saamia and Wanga.

Figure 4: Mould's (1981: 185) classification of Luyia based on lexicostatistics

Mould (1981: 201) also considers sound change in creating a second tree with similar results, shown in Figure 5. There are two differences in this tree: (i) Logoori branches off from Itakho at a higher level in the tree, and (ii) Bukusu, Saamia, and Wanga are not subdivided further. Mould (1981: 201) also computes a third tree based on a comparison of tense/aspect markers; its results are identical to the tree based on sound change.

Nurse & Philippson's (1980) classification is based on a lexicostatistical study with a 400-item wordlist. It includes four Luyia language varieties and essentially gives the same results as Mould (1976, 1981) but with fewer languages. As shown in Figure 6, there is a geographical split that divides Saamia and Bukusu from Itakho and Logoori. Nurse & Philippson (1980) treat this as a North-South split, though "West" vs. "East" appears to us to be equally tenable labels for the two groups.

Figure 5: Mould's (1981: 201) classification of Luyia based on sound change

Figure 6: Nurse & Philippson's (1980) classification of Luyia

#### **2.3 Dialectometrical classification (Heine & Möhlig 1980)**

The classification of 15 varieties of Kenyan Luyia by Heine & Möhlig (1980) is part of a large-scale study of languages in Kenya, which includes a wordlist of 640 concepts, documentation of each variety's phonological system, and basic features of grammar (Heine & Möhlig 1980: 9). Their classification of Luyia is an areal grouping, based on "geographical and synchronic dialectal proximity (Heine & Möhlig 1980: 13)." Heine & Möhlig (1980: 32) state that the Luyia varieties, "neither form a single dialect cluster nor even represent dialects of variations of a single language. The term [Luyia] as such is geographical and has no further dialectological significance." Internal subgroupings are based on "dialectal proximity", which measures the degree of of linguistic similarity in linguistic features, e.g. isoglosses, across dialect clusters.

The four subgroupings established by Heine & Möhlig (1980) are shown in Figure 7. Logooli is viewed as a "separate language", and three other "cluster[s] of dialects" are identified: a Southwestern cluster, a Central-Northern cluster, and a Southeastern cluster. The separation of Logoori from the Southeastern languages and the inclusion of Central and Northern varieties in a single larger branch is similar to Mould's (1976, 1981) classifications but different from Williams (1973).

Figure 7: Heine & Möhlig's (1980) classification of Luyia

The authors indicate that the linguistic data undergirding the atlas would be published in future volumes, but no subsequent studies on Luyia languages were published from the project. See Heine (2013) for a later overview of the *Language and Dialect Atlas of Kenya* project and see Möhlig's (1985) review of Angogo Kanyoro (1983) for some additional comments on his "dialectometrical" analysis of the Luyia area, which includes the identification of three "dialect continua": Marachi-Khayo in the west, Kabarasi-Tachoni-Nyala East in the east, and Nyore in the south.

#### **2.4 Referential classification (Hammarström et al. 2020)**

A recent referential classification of "Greater Luyia", which includes both Kenyan and Ugandan language varieties, is found in the *Glottolog* (Hammarström et al. 2020). The *Glottolog* 4.3 classification, which is based on secondary materials – primarily Mould (1981) – is shown in Figure 8. As noted above, Mould (1981) deals with only a small subset of the language varieties represented in the *Glottolog* classification. It appears that the many other language varieties present in the *Glottolog* classification are populated from uncited sources, with the *Ethnologue*

database being a possible source. The *Glottolog* represents the prior classification with the most Luyia language varieties displayed in a tree format, but it is not a genetic classification based on original data, and the justification for many aspects of its structure is unclear.

Our representation of the *Glottolog* classification given here in Figure 8 differs from the original in a few ways. First, we harmonized some language names (e.g. "Idakho" → "Itakho", "Kabras" → "Kabarasi"), and we eliminated the cl. 11 *ulu*noun class prefix from the languages under the Masaaba node.

Figure 8: Glottolog 4.3 classification of Luyia (Hammarström et al. 2020)

Second, two languages are listed in multiple locations in the *Glottolog* tree, and we have retained the languages in only one position. Tachoni is given under a node for Bukusu where it is contrasted with "Nuclear Bukusu" and under the Central Luyia node. We have not seen evidence for two distinct varieties of Tachoni. Following Odden (2009: 305), who states that "Tachoni most resembles "mainstream" Luhya varieties such as Tsootso, Nyala, Wanga, Kisa, and Marachi, least resembling Bukusu, [Gisu], and Logoori," we have included Tachoni only under the Central Luyia node, and we have simplified the Bukusu node, eliminating "Nuclear Bukusu". Bukusu is also listed in two locations in the *Glottolog*

tree: under the "Bukusuic" node with Kabarasi and under Masaaba, where presumably it refers to Ugandan Bukusu. Lacking evidence for a distinct form of Ugandan Bukusu, we have not included Bukusu under Masaaba.

Third, Nyole and its subvarieties are considered "Unclassified Luyia" in the *Glottolog* and are listed as a highest level branch of Greater Luyia. We have maintained the position of Nyole within the tree but have removed the "Unclassified Luyia" label, acknowledging here that further study of Nyole and other Bantu language varieties of eastern Uganda might place those languages in a different position within the tree. Maho (2009) includes Nyole (JE345) within JE34, which makes it most similar to other southwestern varieties Saamia (JE34), Khayo (JE341), and Marachi (JE342); no other classifications indicate the position of Nyole within Greater Luyia, as far as we know.

Given that Mould (1981) is cited as the basis for the *Glottolog* classification, it is not surprising that the *Glottolog* tree reflects the results of Mould (1981) (see Figures 4–5 above). However, as Mould (1981) includes only 5 language varieties of Greater Luyia (Logoori, Itakho, Bukusu, Wanga, Saamia), many details concerning the internal structure of Luyia are underspecified in Mould (1981). These uncertain details include the grouping of other languages with those studied by Mould as well as decisions to subdivide Mould's (1981) groupings further.

Although it is unclear how the *Glottolog* arrived at its classification, several aspects of its structure are consistent with the results of other research:


Michael R. Marlo et al.

A few aspects of the *Glottolog* classification differ from other prior classifications:


#### **2.5 Issues in prior classifications**

There are a few questions about some of the specific varieties listed in past classifications. Within the Masaaba cluster, the *Ethnologue* treats Gisu and Masaaba as alternative names for the same language, of which Kisu (JE31b) is another alternative name. We are doubtful that there are distinct language forms "Gisu" vs. "Kisu", but we have retained Maho's labels in Table 1 because distinct codes are given to the two language names. The *Glottolog* also includes both "Gisu" and "Kisu", but we are not aware of any discussion of this distinction in the literature.

Buya (JE31G) and Dadiri (JE31F) are treated as dialects of Masaaba in Maho (2009) and the *Glottolog*. Despite the existence of an early grammar (Purvis 1907), a robust wordlist (Siertsema 1981), and a book on Gisu/Masaaba dialect variation (Brown 1972), there is little available information concerning the classification of Gisu/Masaaba language varieties. Similarly, we are unaware of materials dealing with the classification of the Nyole language varieties identified in the *Glottolog*.

A further question concerns Syan, a variety described in Huntingford (1965) based on materials collected in 1924 from Syan migrant workers in Uasin Gishu province in Kenya. Huntingford confirmed that Syan people lived in Bulegenyi District in eastern Uganda as late as 1930, though other investigators reported not encountering a Syan ethnic group in that area. Syan is no longer listed in the most recent versions of the *Ethnologue*, and we are not aware of any additional research on the language. Citing Schoenbrun (1994), the *Glottolog* states that "Syan is a missing language of the North Nyanza subgroup of Bantu, which [is] lexicostatistically too divergent" to be [mutually] intelligible [with] any other

language in JE10 or JE20. In the *Glottolog* classification, North Nyanza includes Ganda, Gwere, Kenyi, Lamogi, and Soga. Maho (2009) however does classify Syan as part of JE31 (JE31D).

It is also unknown how Songa (JE343) fits in. The *Ethnologue* lists it as a dialect of Saamia, and Heine & Möhlig (1980: 32) include it as part of their Saamia-Nyala dialect cluster, with which it is geographically adjacent (see the map in Figure 2). Heine & Möhlig (1980: 32), who focus on languages in Kenya, list a population of 10,000 Songa speakers. This figure is consistent with the fact that the 1979 Kenyan census, cited by Were & Odak (1987: 26), identified 9,000 inhabitants of Usonga location in Siaya District. As reported in Marlo (2007: 2-3), Marlo attempted in 2006 to collect information on Songa, but it was unclear how the variety differed from Nyala West. Confusingly and possibly erroneously, various editions of the *Ethnologue* (e.g. 13th edn.) identify 10,000 speakers of Songa in Uganda, but we have been unable to find any other sources that identify a Songa language variety of Uganda. As noted by Marlo (2009), a 2004 report by the SIL Language Assessment team on Gwe and Saamia in Busia District, Uganda (Anderson et al. 2004), does not mention Songa in its results. We have not seen any Songa data in the literature.

There is also at least one Luyia variety not listed in prior classifications, Tura, which is described in Marlo (2008). Although the exact classification of this variety is unclear, it is geographically and linguistically most proximal to Bukusu, Khayo, and Wanga, and should fit in with the JE30 languages.

A few communities of Greater Luyia have offshoots in the diaspora. Marlo et al. (2017) provide a description of a variety of Nyole spoken in southern Busoga, Uganda. In addition, through various migrations and resettlement patterns, there are sizable communities of Logooli speakers in western Uganda and southern Kenya, around Migori (Chavasu 1997, Heine & Möhlig 1980: 70). As some of diasporic Luyia communities have been separated for 50-60 years or more, in different contact situations, it might be appropriate to treat some of them as distinct varieties or at least to investigate them separately, leaving open the possibility that they are distinct.

To conclude, over the past 50 years, different techniques have been applied to the study of Luyia languages in order to better understand their internal classification: referential classifications, lexicostatistics studies, "dialectometric" studies, and (rarely) classifications based on the study of linguistic innovations. Due to the selection of different type of data (geography, cognate sets or sound changes), methods (lexicostatistics or use of shared phonological innovations in order to make groupings) and the number of languages selected (from 5 to 16 languages), these studies lead sometimes to different classifications. However, we do find

Michael R. Marlo et al.

some accordance in their conclusions: the Luyia group includes languages such as Bukusu, Itakho, Saamia, and Wanga, and most sources recognize the distinctness of southeastern Luyia varieties and the fact that Logoori is the most distinct within Luyia or possibly even a separate language.

## **3 Methodology**

In light of recent developments in the field of historical linguistics which include the appearance of phylogenetic methods borrowed from the field of biology being used to classify languages, we decided to propose the first phylogenetic study of the Luyia languages. Phylogenetic methods are based on a simple principle: languages and species evolve in a similar way, by a process of descent with modification. Therefore, when similarities are observed between species or languages, they can be explained by a common ancestor from which they have descended. By extension, the evolutionary tools used to investigate biological evolution in order to classify organisms in terms of their genealogical relation to one another can also be applied to the study of languages.

In order to carry out our analysis, we compiled wordlists from several sources into a database. These include prior research by Brown (1972), Williams (1973), and Nurse & Philippson (1975), as well as more recent work on Luyia languages by members of our extended research team. This includes a series of Luyia lexical materials that Michael Marlo collected in 2006 based on Appleby (1943), Swadesh lists for several varieties collected in 2016 and 2018, and more extensive lexical research carried out on Bukusu, Tiriki, and Wanga in 2016. It also includes a Swadesh list collected by Deo Kawalya in 2016, and data extracted from lexical materials collected by Kristopher Ebarb in 2012–2013 and David Odden in 2014– 2018. The 61 total datasets in our database at the time of our analysis in 2018 are listed in Table 4.


Table 4: Datasets in our database


Table 4 – continued from previous page


Table 4 – continued from previous page

For the present classification, we used a subset of the datasets in our database. We eliminated languages that had too few words represented in the 200-item Swadesh wordlist. This removed the datasets from Brown (1972) and Mould (1976), and a few others such as those on Upper Nyole and Lower Nyole, which are based on the 100-item Swadesh list. We also eliminated a handful of our research team's datasets where we felt we had a more accurate dataset for the same language. For instance, several of Marlo's 2006 wordlists are preliminary lists of translated words in a practical orthography provided by a speaker working alone that have not been vetted by a linguist. For several varieties, there are more recently developed wordlists with more reliable data (e.g. the materials by Ebarb and Odden), and in such cases we did not use the materials from 2006. We have also collected some Swadesh lists (e.g. on Kabarasi and Kuria) since we completed the analysis reported here; such data are also not included in the results reported.

Our analysis is based on 33 primary datasets, plus 4 outgroup languages: Ha (JD66), Vinza (JD67), Lega (D25), and Yaka (H31). The 33 primary datasets, which are indicated with an asterisk in Table 4, represent 29 language varieties: 16 Luyia varieties (e.g. Bukusu, Tiriki, Wanga) and 13 Bantu languages to the west (e.g. Ganda, Soga) and south (e.g. Kuria, Gusii).

We eliminated words from the 200-item Swadesh wordlist that had entries from fewer than 21 datasets. As a result, our analysis is based on 151 entries from the 200-item Swadesh wordlist.

Once the database of 35 datasets of 151 words was established, we carried out cognancy judgments for each of the 151 words. We used color-coding to form cognate sets based on predictable sound changes between the languages, as in Figure 9. We then transformed the colors into numbers to use for further analysis, as shown in Figure 10.

Next, we built two trees: a network representation shown in Figure 11 and a Bayesian tree-like representation shown in Figure 12. The network was built using a Neighbor-Net algorithm (Bryant & Moulton 2004) which uses a distancebased method that calculates the distance between pair of languages in order to produce a distance matrix. Distances between two or more languages are measured by the percentage of cognates shared. The Bayesian method allows the construction of a sample of trees. The use of the Markov chain Monte Carlo (MCMC) approach (Larget & Simon 1999, Pagel & Meade 2004) allows us to sample trees in proportion to their likelihood. In the tree presented in Figure 12, we can see numbers under the nodes. These numbers correspond to the posterior probability of each node on the tree (which is similar to the proportion of trees in the sample containing that node).

The network presented in Figure 11 displays the relationships between languages studied. If we want to measure the closeness or the distance between two languages, we have to look at the path from language X to language Y. If the path involves a great number of rectangles, it means that the languages are not closely related, e.g. Zanaki (JE44) and Logooli (JE41). But if the path between two languages is short (small number of rectangles), we will consider the two languages close to each other, e.g. Zanaki (JE44) and Shashi (JE404).

### **4 Results**

The analysis of the network presented in Figure 11 shows three main groups: the Luyia group (pink), the Ganda group (green) composed of JE10 languages, and the Kuria group (blue) composed of JE40 languages. The Ganda group (green) is

#### Michael R. Marlo et al.


Figure 9: Establishing cognate sets


#### 21 A phylogenetic classification of Luyia language varieties

Figure 11: Unrooted Neighbor-Net network

more closely related to the Luyia group (pink) than the Kuria group (blue). According to Fitch (1997), the analysis of the webbing or "netting" in a phylogenetic network allows to visualize alternative histories because phylogenetic networks are a generalization of phylogenetic trees that display the representation of conflicting signal or alternative evolutionary histories in a single diagram (Bryant & Moulton 2004).

Within the Luyia group (pink), we can distinguish several small subgroups: (i) a Central-Western group with Marama (JE32C), Kabarasi (JE32E), Wanga (JE32a), Marachi (JE342), and Nyala West (JE18), (ii) a Southeastern group with Isukha (JE412), Tiriki (JE413), Itakho (JE411), and (iii) Logooli (JE41). The amount of webbing observed between the languages in the Luyia group (pink) suggest that these languages are similar and that they must be in a situation of contact (as opposed to the Kuria or Ganda groups where the webbing is reduced).

The analysis of the Ganda group (green) shows two subgroups: (i) Nyoro (JE11) and Tooro (JE12), have a very long common branch showing that these two languages are similar (because they are sharing a high percentage of cognate sets), and (ii) Ganda (JE15) and Lusoga (JE16) linked to Gwere (JE17).

Thanks to the network representation, we can also note that Gisu (JE31a) is situated in between the Luyia group (pink) and the Ganda group (green), showing

Figure 12: Bayesian consensus tree based on a 200-item Swadesh list with 151 words. The numbers show posterior probabilities (nodes with no numbers = 100%): the numbers indicate the percentage of trees in the sample containing that node.

that this language is close to Bukusu (JE31c) and Gwere (JE17). This position of Gisu (JE31a) is not surprising as it is geographically located between Bukusu (JE31c) and Gwere (JE17). Therefore, we can assume that Gisu (JE31a) has some cognate sets in common with the two languages.

The analysis of the Kuria group (blue) shows three subgroups: (i) Gusii (JE42), (ii) Tari Kuria (JE43), Mago Kuria (JE43), and Ngoreme (JE401), and (iii) Nata (JE45), Ikizu (JE402), Shashi (JE404), and Zanaki (JE44). The division into three groups can be corresponds to the geographical location of the languages. In the network, Gusii (JE42) has a separate branch from the other Kuria languages because Gusii (JE42) is geographically remote from the other Kuria languages (see the map in Figure 1). The subgroup with Kuria (JE43) varieties and Ngoreme languages (JE401) corresponds to the languages spoken near the Kenya-Tanzania border. The third subgroup includes the languages spoken in the Mara region of Tanzania: Zanaki (JE44), Shashi (JE404), and Ikuzu (JE402).

The analysis of the Bayesian tree in Figure 12 is in accordance with our analysis of the network. Indeed, the analysis of the tree suggests three groups: Kuria (blue), Ganda (green), and Luyia (pink). Each of these three groups show strong support (i.e. posterior probability: 100% for the Kuria group (blue), 95% for the Ganda group (green), and 100% for the Luyia group (pink). The tree-like representation allows us to visualize the relatedness between the languages, and it also represents the hierarchy between the groupings and the languages studied. First of all, we can notice that the Kuria group (blue) is the first one to diverge, followed by the Ganda group (green), and then the Luyia group (pink). The ordering of the nodes implies that the Luyia group (pink) is the most recent one that has diverged from the Ganda group (in the network representation, the Luyia (pink) and the Ganda (green) groups are very close).

Within the Kuria group (blue), we can observe the same subgroupings as the ones distinguished in the network: Ikizu (JE402), Nata (JE45), Shashi (JE404), and Zanaki (JE44) vs. Ngoreme (JE401), Gusii (JE42), and the Kuria (JE43) dialects.

The Ganda group (green) splits into two groups as in the network with Nyoro (JE11) and Tooro (JE12) vs. Ganda (JE15) and Soga (JE16) linked to Gwere (JE17).

Finally, the Luyia group (pink) shows a succession of branches. The first languages to branch off are the Southwestern varieties Khayo (JE341), Gwe (JE34), and Nyala West (JE18). Then, we have a subgroup composed of the three Northern languages Nyala East (JE32F), Gisu (JE31a), and Bukusu (JE31c) – these three languages are also close in the network – followed by various Central languages branching off in succession: Marachi (JE342), then Wanga (JE32a) and Kabarasi (JE32E), then Marama (JE32C) and Nyore (JE33). Finally, there is a Southeastern

group that splits into two subgroups: with Isukha (JE412), Itakho (JE411), and Tiriki (JE413) in one subgroup and Kisa (JE32D) and Logooli (JE41) in the other.

The network and the Bayesian trees show the fundamental unity of Luyia (pink group), the unity of the Bantu languages to the south of Luyia (blue group), and the unity of the Bantu languages to the west of Luyia (green group), supporting the view of Mould (1976, 1981) and arguing against the geolinguistic classifications that place southeastern Luyia within the JE40 group and Nyala West in the JE10 group. In each tree, the Ganda group of languages (green) is more closely related to Luyia than the Kuria group of languages (blue).

As for the internal structure of Luyia, we will focus our discussion on the Bayesian tree in Figure 12, which uses the 151-item wordlist and which expresses the confidence in each branch of the tree. In general, our tree expresses the unity of Southeastern Luyia with Logooli (JE41), Isukha (JE412), Itakho (JE411), and Tiriki (JE413) (and also confirms the subgrouping of Isukha-Itakho-Tiriki), but there is one surprise, which is the inclusion of Kisa (JE32D) within this cluster. Other prior studies have placed Kisa (JE32D) within a Central cluster, but our tree here unfolds more like an onion with a number of layers that are added as one moves to the west and north within Luyialand.

Several Central varieties cluster next with the Southeastern group, beginning with Nyore (JE33) and followed by the two Marama (JE32C) datasets. It is surprising that the two Marama datasets do not cluster with one another first, but the low-confidence grouping of Marama\_1 before the grouping with Marama\_2 may reflect the uncertainty of grouping the two Marama datasets together first. Next, there is a branching with a cluster that includes Wanga (JE32a) and Kabarasi (JE32E) (though the confidence in the Wanga-Kabarasi cluster is somewhat low at 68%). Next is a low-confidence (40%) branching with Marachi, followed by a surprising branching with Nyala West and then a fairly low confidence (53%) branching with a Northern cluster that includes a Bukusu-Gisu subgroup and Nyala East. The branching with Nyala West is surprising based on geography because prior classifications like Williams (1973) include it in a cluster with Southwestern Luyia varieties like Khayo (JE341) and Saamia/Gwe. Instead, Saamia/ Gwe and Khayo (JE341) attach at the highest most levels to the Luyia cluster. The Bukusu-Gisu cluster follows prior classifications and history that connects Bukusu and Gisu communities.

### **5 Future research**

In future research, we would like to include additional languages in our database, including the JE20 languages around Lake Victoria for better establishing the

position of Luyia with respect to its regional neighbors. Our collaborator Minah Nabirye has recently collected a 200-item Swadesh wordlist for Kenyi, a language that has not figured in prior classifications. We would like to work with Nabirye and Gilles-Maurice de Schryver to include as many Bantu languages of Uganda as possible, including those studied in Nabirye's (2016) dissertation.

As far as further studying the internal structure of Luyia is concerned, we would like to add data on Luyia varieties currently missing from our 200-word comparison, such as Saamia, Tachoni, and Tura, and we would like to incorporate data from another speaker for several of the datasets from Marlo's *Luyia Dictionary Project*, which collected preliminary dictionary materials based on Appleby's (1943) *Luluhya-English Vocabulary* in 14 varieties: Bukusu, Isukha, Itakho, Kabarasi, Kisa, Khayo, Logoori, Nyala West, Nyore, Saamia, Tachoni, Tiriki, Tsootso, and Tura. The highest priority are varieties like Kabarasi and Kisa, which have a somewhat unexpected position in the tree generated by the present classification.

## **Acknowledgments**

We are grateful for financial support from the University of Missouri College of Arts & Science Undergraduate Research Mentorship Program, Campus Writing Program, Honors College, Office of Undergraduate Research, and Research Board, and National Science Foundation Award BCS-1355750. We would like to thank two anonymous reviewers and University of Missouri student Bobby Love for thoughtful feedback on our paper. We also thank Kristopher Ebarb, Deo Kawalya, Minah Nabirye, and David Odden for sharing data with us, Thilo Shadeberg for providing a scanned copy of Williams (1973), Kelvin Alulu and Alfred Anangwe for their assistance in data collection, and our many language consultants for sharing their languages with us and making the present analysis possible.

## **References**


# **Chapter 22**

# **Proto-Bantu reflexes in Dhaisu**

Deo Ngonyani<sup>a</sup> , Ann Biersteker<sup>a</sup> , Angelina Nduku Kioko<sup>b</sup> & Josephat Rugemalira<sup>c</sup>

<sup>a</sup>Michigan State University <sup>b</sup>US International University-Africa <sup>c</sup>Tumaini University Dar es Salaam College

This paper is a study of Proto-Bantu reflexes in Dhaisu, a highly endangered language also known as Dhaiso, Segeju, Daisu and Kidhaisu (dhs, E56). Dhaisu is spoken in the East Usambara Mountains in northeastern Tanzania, but its closest relative is Kamba (E55). Seven vowels are reported in this study as has been in other studies /i, ɪ, ɛ, a, ɔ, ʊ, u/. However, no contrast can be established between *ɪ* and *ɛ*, or between u and ʊ. The data show that Dhaisu vowel system is changing to a 5-vowel system *\*i, \*i, \*e, \*a, \*o, \*u, \*u ̧* ̧ > /i, ɛ, a, ɔ, u/. The most remarkable feature of this change is that unlike other Bantu languages, which merge the mid-high to high vowels, Dhaisu is merging the mid-high vowels to mid-low. The innovation is demonstrated in (a) numerous lexical items in which Proto-Bantu *\*i* has become *ɛ*, and PB *\*u* has become *ɔ*; (b) several nominal prefixes that are constructed in PB having mid-high vowels now have mid-low vowels, and (c) the applicative suffix whose PB form was *\*id* is now *-ɛr* in Dhaisu. The fact that the change does not seem to have affected all nominal prefixes with mid-high vowels, and has affected not all verbal derivation with mid-high vowel suggests an ongoing transition. Reflexes of consonants are presented to show that they are not a result of spirantization.

## **1 Introduction**

This paper presents the sound system of Dhaisu (ISO 632-2 code *dhs*) and reflexes of Proto-Bantu sounds. Dhaisu is a Bantu language spoken on East Usambara in northeastern Tanzania in Tanga region. Simons & Fennig (2017) estimate there are about 5,000 speakers. Nurse (2000: 17) reported between 8,000 and 10,000 speakers. Rugemalira et al. (2019: 2), however, estimate there are about 11,000

speakers. The language is also known as Dhaiso, Dhaisu, Daisu, Daiso, Kidaisu and is referred to by outsiders as Segeju. However, Segeju is actually a different language spoken on the coast (Nurse 1982). Guthrie codes Dhaisu E56 (Guthrie 1967–1971, Maho 2009) in a group that includes Gikuyu (E51), Embu (E52), Meru (E53), Tharaka (E54), Cuka (E541), and Kamba (E55), a group commonly refered to as Thagicu. Dhaisu's closest relative in the Thagicu group is Kamba (Nurse 1982, 2000, 1999). Dhaisu shares with other Thagicu languages features that include an inherited common lexicon, retention of 7 vowels, absence of spirantization in high vowel environments, and fronting of Proto-Bantu (PB) palatal *\*c* and *\*j* (Nurse 1982). The term spirantization in Bantu languages is used to denote the process of forming fricatives from Proto-Bantu stops that appear before high vowels (Nurse 1982, Schadeberg 1995).

The present study is based on data collected in Dar es Salaam from six speakers of Dhaisu from Bwiti in July-August 2017 and in Tanga in July-August 2018. The dictionary that was subsequently published as Rugemalira et al. (2019) is the main source for this study. The sounds in Dhaisu are compared to the Proto-Bantu forms from Tervuren's Bantu Lexical Reconstruction 3 (BLR3) (Bastin et al. 2002). The paper demonstrates that the Dhaisu vowel system is in a transition from a 7-vowel system to a 5-vowel system. The paper presents more data to confirm this and show that the change is unusual because it is the mid-vowels that are merging rather than an upward merger to high vowels that is observed in other Bantu languages. The lowering of mid-high vowels was first noted by Nurse (2000). The uneven distribution of the vowel change and the fact that the mid-high vowels are also heard, suggests that the change is ongoing. Furthermore, the data demonstrate that the vowel merger is not associated with spirantization.

These findings are presented in the following 5 sections. §2 presents the phoneme inventory, while evidence of vowel merger is in §3 in form of correspondences in Dhaisu, Kamba, Swahili, Sambaa, and Digo. §4 illustrates the vowel change as attested in nominal affixes and verbal suffixes. §5 contains examples of consonant reflexes which reveal that previous changes in the consonants were not associated with spirantization. Concluding remarks and directions for further studies appear in §6.

## **2 Dhaisu phonemes**

This section presents the vowel and consonant phonemes of Dhaisu and sets the stage for comparing them to Proto-Bantu forms. The Dhaisu sound system

consists of 7 vowels, reflecting Proto-Bantu vowel system. There are three front vowels, three back vowels and a middle open vowel. These appear in four levels as shown in Figure 1.

$$\begin{array}{ccc} \mathbf{i} & & \mathbf{u} \\ \mathbf{i} & & \mathbf{a} \\ \mathbf{e} & & \mathbf{b} \\ & \mathbf{a} & & \end{array}$$

Figure 1: Dhaisu vowel inventory (Rugemalira et al. 2019: 4)

These seven vowels are recognized by Nurse (2000) and Rugemalira et al. (2019). Furthermore, Nurse (2000: 20) reports on the difficulties researchers faced in identifying distinctive 7 vowels. The current study faced the same problem of being unable to elicit minimal pairs to contrast mid-high and mid-low vowels. Clear pairs were elicited that contrasted the high vowels /i, u/ and mid vowels /ɛ, ɔ/.


The sounds /i/ and /ɛ/, are distinct as are /u/ and /ɔ/, as the minimal pairs in (1) demonstrate. However, the contrast of the mid-vowels *ɪ* vs *ɛ* and *ʊ* vs *ɔ* is not easy to establish. The same words may be pronounced in two different ways by the same speaker. Whenever the researchers sought to get a clearer repetition, the speakers would pronounce the mid-low vowels. Here are some examples of words pronounced with alternative mid vowels. The speakers also insisted on writing with the mid-low vowels. The writing is reflected in the orthography adopted in Rugemalira et al. (2019).


This is likely why Nurse (2000) notes that there are the 7 as well as 5 vowel transcriptions in previous documentation. Vowel length is not contrastive.

The consonant system consists of 25 consonants as presented in Table 1. There are 8 stops, paired into voiced/voiceless in four places of articulation, namely, bilabial, alveolar, palatal and velar. There are two pairs of fricatives *f/v* and *s/z*. In some cases, /β/ is heard as a possible variant of /v/. In addition, there is a palatal fricative /ʃ/ and glottal fricative/h/. The four nasal consonants are bilabial /m/, alveolar /n/, palatal /ɲ/ and velar/ŋ/. These same places are for prenasalized stops /mb, nd, ɲɟ, ŋg/. Both /r/ and /l/ are heard, with /r/ more frequently heard. The distribution of the two non-contrastive liquids is not clear yet.


Table 1: Consonant inventory (Rugemalira et al. 2019: 5)

Some sounds appear mixed possibly due to the influence of the Swahili alphabet as speakers insisted on how some sounds should be written even though Swahili and Dhaisu have significant differences. Dhaisu speakers also speak and write Swahili. The idea of Swahili influence on Dhaisu writing, therefore, is not far-fetched. Nurse (2000) also notes such mixtures. Some of the vowel features may be influenced by tones, which have not been marked in this study.

### **3 Vowel merger**

As noted earlier in the foregoing section and as discussed in Nurse (2000), one can perceive the seven vowels in Dhaisu. But finding contrastive pairs for the intermediate vowels is an elusive endeavour. We conclude that this is due to the current state of transition from a seven-vowel system to a five-vowel system. In this section, we present data that show that Dhaisu is exhibiting the loss of distinction between /ɪ/–/ɛ/ and /ʊ/–/ɔ/. This is very interesting in the light of Schedeberg's observation that the merger of intermediate high vowels i/e and *\*u/\*o* is unknown (Schadeberg 1995: 74). The baseforms are the Proto-Bantu vowels shown in Figure 2.

Figure 2: Proto-Bantu vowels (Meeussen 1967: 82)

The more widely attested 7 > 5 vowel change merges the so called superhigh vowels *\*i̧* and *\*u̧* with high vowels *\*i* and *\*u*, respectively. Data from Dhaisu reveal that intermediate *\*i* and *\*e* merge to /ɛ/ and *\*u* and *\*o* merge to /ɔ/. This is summarized in Figure 3. The more usual merger, here exemplified by Swahili in Figure 4, Swahili merges up, while the unusual merger in Dhaisu merges down.


Figure 3: Dhaisu vowel merger

Figure 4: Swahili merger

This downward merger is most likely not due to loans since there are no languages in the region that have had such a merger. We compare PB reflexes of Dhaisu and languages that may have had great influence on it. The languages that have most influence on Dhaisu are Kamba (E55), its closest relative, Swahili (G42a), the national language that all Dhaisu speakers are fluent in, Sambaa (G23), a neighboring language, and Digo (E73) another neighboring language. The following sets of examples illustrate that Dhaisu is unlike Kamba's sevenvowel system and Thagicu languages of Kenya, and unlike five-vowel systems like Swahili, Sambaa, and Digo. Data for the comparison are from BLS for Proto-Bantu forms Bastin et al. (2002), Nurse & Hinnebusch (1993) and TUKI (1996) for Swahili, Nurse & Philippson (1975) for Sambaa and Kamba, and Mwalonya et al. (2004) for Digo.

The data show the critical difference between Dhaisu and other 5-vowel systems such as Swahili, Sambaa, and Digo. These attest to *\*i* > *ɛ* (Table 3) and *\*u* > *ɔ* (Table 7). The corresponding innovations in Swahili and Sambaa are *\*i > i* (Table 3) and *\*u > u* (Table 7).

The examples demonstrate three sets of reflexes of PB vowels:

a. Kamba representing 7-vowel systems /i, ɪ, ɛ, a, ɔ, ʊ, u/ retained from Proto-Bantu;


These patterns are represented in Table 9.


Table 2: Proto-Bantu *\*i*̧

Table 3: Proto-Bantu *\*i*


Table 4: Proto-Bantu *\*e*



Table 5: Proto-Bantu *\*a*

Table 6: Proto-Bantu *\*o*


Table 7: Proto-Bantu *\*u*


Table 8: Proto-Bantu *\*u*̧



Table 9: Vowel correspondences

Nurse (2000: 20) suggests that more than half of Dhaisu vocabulary is from neighboring languages. A possible source of this merger could be loans from those languages. As the *\*ɪ/\*ʊ > ɛ/ɔ* is not attested in any other language which may have greatly influenced Dhaisu, this change must be considered unique for this language. The lowering of PB mid-high vowels PB *\*i > ɛ* and PB *\*u > ɔ* appears to be in progress and incomplete. This is because the lowering of mid-high vowels appears not to have taken place in some words and some environments.

This section has presented lexical data that reveal a unique 7 > 5 vowel change in Dhaisu. The merger is evident in other environments too, to which we now turn.

### **4 Evidence from affixes**

Another body of data that attests to the merger is affixes. In this section, we highlight (a) nominal affixes and (b) the applicative suffix.

#### **4.1 Nominal affixes**

Like other Bantu languages, Dhaisu categorizes nouns into classes. There are 17 classes in Dhaisu. Some of the classes are semantically transparent and others are not. One important feature of the noun classes is the nominal prefixes that characterize each class. The classes are labeled using numbers as is commonly done in Bantu linguistics and reconstructed PB forms. Dhaisu's correspondences of class prefixes with mid-high vowels in Proto-Bantu provide useful evidence. These are Classes 1, 3, 4, 7, 11, 13, 14, 15, and 17. Nurse (2000: 22) suggests that the

vowels for the prefixes in Dhaisu are mid-high. The relevant prefixes are shown as bold in Table 10.


Table 10: Class prefixes compared (Meeussen 1967: 97; Rugemalira et al. 2019: 14)

Evidently, not all intermediate high vowels in prefixes have merged to mid-low. Classes 1 and 3 *\*mu- > mɔ-*, as Classes 4 and 11 lowered *\*mi- > mɛ-* and *\*du- > rɔ* respectively. Meanwhile, Classes 7 and 14 have merged higher as *\*ki- > ki-* and *\*bu- > u-*. Nurse (2000: 32) indicates Class 15 as *\*ku- > ku-*. The nominal prefixes, therefore, provide examples of both downward merger and upward merger of the same vowel. Further investigation should be able to determine the exact patterns of the change and possible factors, such as contact with different languages.

#### **4.2 Verbal suffix**

Dhaisu has a system of verbal derivations, a system that characterizes Bantu languages. This includes verbal suffixes for causative, applicative, reciprocal, passive, neuter, and separative. These have also been reconstructed to some protoforms with high vowels, mid-high vowels, and the low vowel (Meeussen 1967). Four verb extensions in Dhaisu may be traced to PB mid-high vowels. They are:


Of these, only the applicative appears relatively consistently in Dhaisu as *-ɛr*. The applicative suffix which is reconstructed as *\*id* in Proto-Bantu Meeussen (1967: 92) shows evidence of downward merger. The reflex in Dhaisu is *-ɛr-*, as shown in the following examples.


In some cases, the applicative may still be pronounced as *-ir* as can be seen in Rugemalira et al. (2019: 25). It is quite possible that such forms result from Swahili influence. Unlike other languages in the area, Dhaisu does not exhibit vowel harmony for such an extension. An invariant *-ɛr-* appears even though Nurse (2000: 40) suggests that some are realized as *-ɛl-* and some are *-ir-*.

The neuter affix, also known as stative, is less consistent in the realization of the vowel, as the following examples show.


This set contains words some of which take *-ɛk* and others that take *-ik* as the neuter suffix. They do not appear to be motivated by vowel harmony. Considering that *-ɛk* is not used the same way in any of the languages that seem to influence Dhaisu, it must be an innovation in this language. This innovation is consistent with the ongoing merger of the intermediate vowels.

The reversive, also known as separative, appears as a rather unproductive suffix applying to very few items that cannot lead us to a reasonable generalization. The passive is generally realized as a /w/ followed by the final vowel /a/.

From the evidence presented, we conclude that the vowels are in the process of merging intermediate vowels. The indeterminancy that is sometimes observed is due to the fact that the changes are ongoing and not affecting all cases in the same way. That is why there is a mixture of high-mid and low-mid vowels in the class prefixes and in some cases where both appear as in *\*gùdù* > *kɔru* 'leg'. There are many cases that appear to indicate the lowering may not affect vowels in word-final position, as in *\*kuku > ŋgɔku* 'chicken'.

Schadeberg (1995) notes, among other things, the co-occurence of spirantization and 7 > 5 vowel change in Bantu languages and that the 7 > 5 vowel change was invariably preceded by spirantization. He notes also that only a few languages have had spirantization without vowel merger. Most notable is that merger of intermediate high vowels *\*i/\*e* and *\*u/\*o* is unknown in the languages studied (Schadeberg 1995). The data in Dhaisu are interesting in at least two respects. One is that the merger is rather recent and no spirantization is co-occurring. The vowel merger that is associated with spirantization involved mid-high and high vowels. Only very few languages attest to 7 > 5 change without spirantization (Bostoen 2008). This leads us to consider consonant reflexes.

### **5 Consonant reflexes**

In this section, we present Dhaisu's reflexes of Proto-Bantu consonants and examine various processes that have affected the consonant inventory. The reflexes are presented in Nurse (1982: 204) where they are compared to other Thagicu languages. This section provides examples of words with the reflexes.

All stops appear to have undergone significant innovations that are not associated with spirantization or fricativization before *\*i̧*and *\*u̧* (Nurse 1982, 2000). In the neighboring Seuta languages (Shambala, Bondei, Digo, Segeju) and Sabaki (e.g. Swahili, Mijikenda), as well as in the entire Northeast Coast Bantu, stops became fricatives before the two high vowels (Nurse & Hinnebusch 1993). Again, our reference point is Proto-Bantu as shown in Table 11.


Table 11: Proto-Bantu (Meeussen 1967: 83)

The exact nature of the PB voiced stops is a subject of considerable controversy and disagreements (Hyman 2019, Mould 1972). Whether they are stop *\*b, \*d, \*g* (Guthrie 1967–1971, Meeussen 1967, Meinhof 1932) or continuant *\*β, \*l, \*ɣ* (Meinhof 1932) does not affect the findings. We describe (a) fricativization of *\*p*, (b) voiceless stops that did not change, (c) various reflexes of voiced consonants, and (d) changes of PB palatal consonants.

Table 12: PB-Dhaisu consonant reflexes


Proto-Bantu voiceless bilabial stop *\*p* became voiced and eventually changed into a labio-dental fricative /v/ in Dhaisu. Examples in (5) demonstrate this.

(5) \*p > v


The *\*p* > *v* change is attested in all environments. The change from a stop to fricative is the kind of spirantization that is not associated with super-high vowels noted in other Bantu languages.

In Dhaisu, the voiceless alveolar stop and voiceless velar stop have remained stable and not subject to changes observed with other consonants. The two sets in (6) and (7) provide examples.

(6) \*t > t


(7) \*k > k


The two consonants have remained stable with the reflexes showing *\*t > t* and *\*k > k* in all environments.

Proto-Bantu voiced stops *\*b*, *\*d*, and *\*g* have reflexes that show significant changes before all vowels. The bilabial *\*b* has been lost before all vowels except in cases where it appears in post-nasal position.

(8) \*b > ∅


The plural form *\*batu* 'people', for example, is *atu* in Dhaisu. Likewise, *\*bona* 'see' has become ɔna. In other environments, this loss was followed by glide formation when a prefix is attached as in *\*bidi* > *mwɛrɛ* 'body' where the prefix *\*mu-* has become *mw-* before the now vowel-initial stem. The change *\*b* > ∅ is widely attested in the language. A similar case to this is observed with the reflex of Proto-Bantu voiced velar stop *\*g*. Consider the following example reflexes.

(9) \*g > ∅


In *\*tega > tɛa* 'set a trap,' the velar stop is deleted as is the case in several other words such as *\*mbògà > mboa* 'vegetable'. The change *\*gùnd̀ a* 'forest' > *mnda* 'farm' involved the deletion of the velar stop as well as the vowel *u*. It is important to note, once more, that both *\*b* > ∅ and *\*g* > ∅ are not associated with the so-called super-high vowels.

Proto-Bantu alveolar consonant *\*d* has the alveolar/r/ as its reflex. This is illustrated in the following examples.

(10) \*d > r


Occasionally, a lateral liquid /I/ appears as the reflex. But the majority of cases are /r/. As with other consonants, there are no effects of the high vowels associated with spirantization.

The reflexes of the palatal consonants *\*c* and *\*j* reveal more dramatic shift in the consonants. Both are fronted in Dhaisu. Proto-Bantu *\*c* became /d/ that can occassionally be heard as a dental /d̪/.

(11) \*c > d


Voiceless *\*c* became /d/. The dentalization that is sometimes heard is a feature which identifies Dhaisu with Thagicu languages but also sets it apart from the rest. Thagicu languages have developed interdental fricatives for Proto-Bantu *\*c* (Nurse 1982). Dhaisu distinguishes itself from Thagicu in this by not having an interdental fricative.

Proto-Bantu voiced palatal consonant *\*j* shifted to a voiceless alveolar fricative /s/ in Dhaisu.

(12) \*j > s


Unlike its voiceless counterpart, the PB voiced *\*j* devoiced and the loss of occlusion created a fricative. Compared to other consonants, the PB reflexes of *\*c* and *\*j* are relatively few.

To sum up, we have identified various reflexes of PB consonants and noted the absence of patterns of spirantization, a feature that appears prevalent in Thagicu languages. PB bilabial *\*b* and velar *\*g* have been lost. A notable feature has been the dramatic shift of palatal *\*c* and *\*j* to /d/ and /s/ respectively. The sounds /p, b, f, g, h, z/ found in Dhaisu are non-inherited (Nurse 2000) and not a result of the spirantization observed in other Bantu zones.

One question that arises is if the reflexes of *\*b* and *\*g* are ∅, then where did /b, g/ in contemporary Dhaisu come from? Likewise, in spite of *\*p > v*, there is /p/ in the language today. There are several sources. One source of contemporary forms is loanwords. Recall that Nurse (2000: 20) notes, as much as 60 percent of Dhaisu vocabulary is borrowed. Many words with such consonants are from Swahili and neighboring languages. For example, words like *bahati* 'luck' and *bei* 'price' are words with /b/ found in the language. These words are from Arabic (TUKI 1996), borrowed via Swahili. Another likely source of such sounds is phonological innovations that may have happened. Consider the words (13):

(13) Examples of Dahl's law


These are traces of Dahl's law, a dissimilation phenomena observed in some Bantu languages in which the voiceless consonant of the first syllable in a sequence of two voiceless onsets becomes voiced. Thus, although PB *\*g* > ∅, the voicing of *\*k* results in /g/, as in *\*kú̧tà > maguta* 'oil'. Such processes have contributed to the inventory of phonemes in contemporary Dhaisu.

## **6 Concluding remarks and future research**

In this paper we set out to describe Proto-Bantu reflexes in Dhaisu. Using lexical items from Bantu Lexical Reconstruction, we have demonstrated one remarkable feature in Dhaisu, namely, \*7 > 5 vowel merger which is taking place. Although \*7 > 5 vowel merger is widespread in Bantu, Dhaisu is unique in that it is merging mid-high and mid-low vowels rather that merging mid-high and high vowels. Future research calls for an exploration of tone and its possible role in the merger and other processes. The fact that the intermediate vowels are merging raises the question of what the effects of such merger are on the quality of the high vowels. It may also be the case that the mid-high vowels are acoustically closer to midlow vowels. Acoustic studies will shed some light on this question. The data on which this study is based were collected in elicitation of words. Often times a request to repeat a word with intermediate high vowel brought a different vowel. Studying vowels from texts and narratives rather than elicited wordlists may result in clearer data. The paper has also presented data that show that Dhaisu, like other Thagicu languages, did not undergo spirantization before high vowels, a process that is linked to changes in the Bantu vowel systems.

## **Abbreviations**


## **Acknowledgements**

This work could not have been possible without the generous participation of the native speakers of Dhaisu, particularly Omari Gauwa, Juma Faki, Juma Mwinyihamisi and Fatuma, all of Bwiti. Many thanks to Mohammed Rafiq Yunus whose initial work on the Dhaisu people led to this study. He put us in contact with the Dhaisu speakers and paved the way for much of the project. We appreciate

comments from reviewers who helped us clarify many points in the paper. The fieldwork on documentation of Dhaisu, on which this paper is based, was funded by Michigan State University's Alliance for African Partnership.

### **References**


# **Name index**

Aarts, Bas, 56 Abels, Klaus, 118 Aboh, Enoch Oladé, 117, 268 Abubakari, Hasiyatu, 124 Abusch, Dorit, 50 Adams, Nikki, 197 Adler, Allison N., 29 Agbayani, Brian, 118 Agbedor, Paul, 268 Akpanglo-Nartey, Rebecca Atchoi, 126, 143 Akrofi Ansah, Mercy, 128, 132 Amaechi, Mary, 108, 111, 114, 118 Ameka, Felix K., 124, 133 Anagnostopoulou, Elena, 249 Anderson, Heidi, 395 Anderson, Stephen R., 50 Andrews, Avery, 228 Angogo Kanyoro, Rachel Msimbi, 391 Appleby, Leonora L., 396 Asher, Nicholas, 184 Ashton, Ethel O., 48–50, 168, 228, 232, 239 Atintono, Samuel A., 347, 348 Austin, Peter K., 346, 348, 361 Avolonto, Aimé, 212 Baker, Mark C., 155, 172, 174, 175, 196, 205, 254, 267, 268 Bambi-Idikay, Annie, 26, 27 Barbosa, Pilar, 220

Barner, David, 294 Bastin, Yvonne, 410, 413 Bays, Alison M., 318 Bazalgette, Timothy, 112 Beck, Sigrid, 114, 179 Bendor-Samuel, John, 347 Berko, Jean, 26 Berlin, Isaiah, 321, 341 Bisilki, Abraham Kwesi, 126, 143 Bleek, Wilhelm H. I., 48 Blommaert, Jan, 309, 310 Boadi, Lawrence A., 127 Bodomo, Adams, 1, 124, 127, 143, 268, 347, 351 Boersma, Paul, 2, 15, 30 Bokamba, Eyamba G, 31, 33 Bokamba, Georges D., 48, 49, 62 Bošković, Željko, 277 Bostoen, Koen, 166, 168, 419 Bowern, Claire, 348, 360–362 Bowler, Margit, 174 Boyd, Raymond, 17 Boyeldieu, Pascal, 9, 10, 13, 14, 16, 17, 19 Branan, Kenyon, 149, 151, 152, 154, 157 Branigan, Phil, 254, 259, 262 Bresnan, Joan, 223 Brown, Gillian, 394, 396, 398 Brown, Rhonda, 318, 320 Bruening, Benjamin, 254, 259 Brunache, Paul, 20

Bryant, David, 399, 401 Buell, Leston, 197, 198 Bunt, Harry C., 291 Byrd, Dani, 3, 6 Calabrese, Andrea, 26 Cammenga, Jelle, 52 Campbell, George L, 31, 33 Carrel, Patricia L., 108 Casali, Roderic F., 19 Chagas Jeremy, E., 48, 52, 53 Chavasu, Henry O., 395 Chelliah, Shobhana L., 130, 348 Cheng, Chung-Ying, 291 Cheng, Lisa Lai-Shen, 294 Cheung, Pierina, 294 Chiarcos, Christian, 238 Chierchia, Gennaro, 220, 223, 293 Childs, George T., 131 Cho, Mi-Hui, 29 Chomsky, Noam, 118, 150, 157, 223 Cinque, Guglielmo, 158, 273 Cloarec-Heiss, France, 9, 10, 13, 14, 16, 17, 19, 20 Collins, Chris, 179, 196, 205, 267, 268 Comrie, Bernard, 229, 232, 239 Crothers, John, 10 Crystal, David, 348 Culicover, Peter W., 223 Davis, Stuart, 29 De Blois, Kornelis F., 48, 51, 54, 57, 60 De Dreu, Merijn, 197 De Pauw, Guy, 77 De Schryver, Gilles-Maurice, 77 De Swart, Henriëtte, 293 Deal, Amy Rose, 291, 293, 294 Déchaine, Rose-Marie, 212

Dewees, John William, 48, 50 Diercks, Michael, 207, 248, 249 Dik, Simon C., 107, 123, 127, 128, 131 Dimitriadis, Alexis, 169, 170, 175 Doetjes, Jenny, 291, 293 Dolphyne, Florence Abena, 265, 266 Dom, Sebastian, 174 Dowty, David R., 223 Drubig, Hans Bernhard, 127 Dubinsky, Stanley, 169, 170, 172 Dumah, Irene, 136, 143 Dupoux, Emmanuel, 27 Durrleman, Stephanie, 229 Elugbe, Ben Ohi, 5 Emenanjo, E. Nolue, 108 Erlewine, Michael Y., 118, 162 Essegbey, James, 348 Fallowfield, Lesley, 318 Fanselow, Gisbert, 116 Farkas, Donka, 293 Fennig, Charles D., 9, 124, 368, 409 Ferrari-Bridgers, Franca, 48–50 Ferraro, Gary, 370, 371 Fiedler, Ines, 107, 108, 114, 116, 117, 119 Fitch, Walter M., 401 Flavier, Sébastien, 12 Folli, Raffaella, 173 Freese, Jeremy, 318, 320 Fretheim, Thorstein, 65, 238 Gambarage, Joash Johannes, 48, 49 Gelas, Hadrien, 77 Georgi, Doreen, 108, 118 Gippert, Jost, 348 Giusti, Giuliana, 69 Givón, Talmy, 48, 49 Gluckman, John, 174

Demuth, Kathrene, 227, 230

Goldsmith, John, 76, 77, 109 Green, David W., 35 Green, Georgia M., 232 Green, Margaret M., 108 Greenberg, Joseph H., 11 Grenoble, Lenore A., 348, 349, 361 Grimm, Scott, 291, 294 Grohmann, Kleanthes, 118 Grosjean, François, 34, 35 Grosz, Barbara J., 232 Gundel, Jeanette K., 65, 230, 238 Guthrie, Malcolm, 48, 165, 384, 410, 420 Hacquard, Valentine, 273 Haida, Andreas, 114 Hale, Ken, 167, 177, 178, 348 Halle, Morris, 5 Halliday, Michael A. K., 238 Halpert, Claire, 149, 151, 157, 196–198, 200, 206, 207 Hamann, Silke, 30 Hammarström, Harald, 391, 392 Harford, Carolyne, 227, 230 Harley, Heidi, 173, 175 Harris, Zellig S., 79 Hartell, Rhonda L, 347 Hartmann, Katharina, 131 Haspelmath, Martin, 174 Hasselbring, Sue, 129 Heidinger, Steffen, 174 Heine, Bernd, 373, 374, 384–386, 390, 391, 393–395 Hemforth, Barbara, 229 Himmelmann, Nikolaus P., 346, 348, 370 Hinnebusch, Thomas J., 413, 419 Hiraiwa, Ken, 268, 270, 271 Hornstein, Norbert, 218, 223

Horvath, Julia, 114 Huang, CT James, 215 Huang, Yan, 228 Hudu, Fusheini, 5, 124 Huntingford, G.W.B., 394 Hurskainen, Arvi, 77 Hyman, Larry M., 48, 50, 129, 420 Igwe, G. E., 108 Ikekeonwu, Claro I., 109, 114 Inagaki, Shunji, 294 Issah, Samuel Alhassan, 124, 132, 143 Jackendoff, Ray, 107, 223 Jansen, Bert, 268 Jefferson, Gail, 328 Jenkins, Valerie, 318 Jenks, Peter, 158 Jesus, Luis M.T., 2 Johnson, Frederick, 370 Johnson, Kyle, 179 Jones, Taylor, 200 Kalinowski, Cristin, 108 Kalmykova, E. S., 320, 341 Kamanda-Kola, Roger, 10 Kandybowicz, Jason, 265, 268, 273 Kang, Yoonjung, 26, 27, 29, 30, 32 Katamba, Francis X., 48, 50 Kayne, Richard, 192 Keach, Camillia, 227, 230, 231 Keenan, Edward, 229, 232, 239 Kennedy, Jack, 1 Kenstowicz, Michael, 28, 30 Keupdjio, Hermann Sidoine, 184 Keyser, Samuel Jay, 167, 177, 178 King, Gareth, 31, 33 Kioko, Angelina, 306, 307, 309 König, Christa, 373

#### Name index

Kouankem, Constantine, 190 Kramer, Ruth, 155, 254 Kratzer, Angelika, 268 Krifka, Manfred, 107, 123, 131, 291, 293 Kropp Dakubu, Mary E., 132, 143, 347 Kulikov, Leonid, 174 Kutsch Lojenga, Constance, 11 LaCharité, Darlene, 27 Ladefoged, Peter, 1, 5, 16 Lambrecht, Knud, 123, 230, 238 Landau, Idan, 218, 223 Landi, Germain, 17 Landman, Fred, 293 Larget, Bret, 399 Lasnik, Howard, 223 Leahy, Margaret M., 319 Legate, Julie Anne, 173 Legère, Karsten, 303, 304 Levin, Beth, 167 Levy, Roger, 229 Lewis, M. Paul, 384 Lin, Yen-Hwei, 29, 30 Lindén, Krister, 77 Link, Godehard, 291 Longobardi, Giuseppe, 55 Lyons, Christopher, 50, 62, 63, 65, 66, 68, 69 MacArthur, Julie, 383 MacDonald, Jonathan Eric, 276 MacKenzie, Marguerite, 254, 259, 262 Madigan, Sean William, 220 Madumulla, Joshua, 302, 308, 314 Maho, Jouni Filip, 384–388, 393–395, 410 Majid, Asifa, 130

Mallya, Aurelia, 174, 175 Manfredi, Victor, 108 Marantz, Alec, 175 Marchese, Lynell, 289 Marfo, Charles O., 127 Marlo, Michael R., 395 Martin, Cansada, 268 Matushansky, Ora, 254 Maynard, Douglas W., 317, 318, 320 Mbah, Boniface Monday, 108 Mchombo, Sam, 149, 152–156, 158, 170, 171 Meade, Andrew, 399 Meeussen, A. E., 48, 413, 417, 418, 420 Meinhof, Carl, 420 Mergenthaler, Erhard, 320, 341 Miao, Ruiqin, 29, 30 Mkude, Daniel, 314 Mmaduagwu, Georgina Obiamaka, 109 Möhlig, Wilhelm J. G., 384–386, 390, 391, 393–395 Moñino, Yves, 17 Moseley, Christopher, 349 Motingea Mangulu, André, 31, 33 Mould, Martin J., 48–50, 383, 387, 389, 391, 393, 394, 398, 420 Moulton, Vincent, 399, 401 Movahedi, Siamak, 320, 339, 341 Mpiranya, Fidèle, 97 Mudimbe, V. Y., 26, 27 Mufwene, Salikoko S., 311 Muhirwe, Jackson, 77 Muthwii, Margaret, 309 Muzale, Henry R. T., 308, 313 Mwalonya, Joseph, 413 Naden, Tony, 125, 347

Ndayiragije, Juvénal, 48

Ndimele, Ozo-mekuri, 114 Ngonyani, Deo, 172, 173, 175, 227, 230 Nikolaeva, Irina, 151 Nsoh, Avea E., 347 Nurse, Derek, 387, 389, 390, 396, 409–413, 416–419, 423 Nwachukwu, P. A., 115 Nwankwegu, Jeremiah A., 109, 114 Nyerere, Julius K., 303 Obeng, Samuel Gyasi, 317, 319–322, 328, 341 Odak, Osaga, 395 Odden, David, 392 Ogbulogo, Charles Vzodimma, 114 Olawsky, Knut J., 124 Olson, Kenneth S., 10, 13, 17 Onea, Edgar, 107, 112, 124 Osam, Emmanuel Kweku, 265, 267 Osuagwu, Eunice C., 115 Pagel, Mark, 399 Paradis, Carole, 27 Pearson, Hazel, 184 Peperkamp, Sharon, 27, 29 Pesetsky, David, 265, 270 Petzell, Malin, 48, 49 Philippson, Gérard, 387, 389, 390, 396, 413 Pietraszko, Asia, 196, 200, 202, 205, 206 Pike, Kenneth L., 16 Polinsky, Maria, 254, 259 Polomé, Edgar C., 168 Potsdam, Eric, 254, 259 Preminger, Omer, 155 Prince, Ellen, 238 Progovac, Ljiljana, 48, 49, 63 Purvis, John Bremner, 394

Pylkkänen, Liina, 173, 175 Qorro, Martha, 302 Quine, Willard Van Orman, 291 Rackowski, Andrea, 149, 151, 152, 154, 157, 159 Rice, Keren, 172 Richards, Norvin, 149–152, 154, 157, 159–162 Rizzi, Luigi, 115, 157, 190, 229, 273 Rochemont, Michael Shaun, 114, 131 Rolle, Nicholas, 20 Rooth, Mats, 107, 112 Rose, Yvan, 29 Rosen, Sidney, 318 Rugemalira, Josephat, 302, 308, 313, 409–412, 417, 418 Russell, Joan, 229, 231 Saanchi, James Angkaaraba, 143 Sabelo, Nonhlanhla O., 196, 200 Salzmann, Martin, 247, 248, 250, 257, 262 Samarin, William J., 17, 20 Sande, Hannah, 283, 284 Sauerland, Uli, 293 Schadeberg, Thilo C., 166, 168, 199, 228, 229, 239, 410, 412, 419 Schäffer, Wolfram, 127 Schell, Jane O., 318 Schneider-Zioga, Patricia, 196, 205, 206 Schoenbrun, David L., 394 Schwabe, Kerstin, 123 Schwarz, Anne, 124, 125, 128, 129, 131, 138–140, 142 Sebba, Mark, 268 Segerer, Guillaume, 12

Seidl, Amanda, 169, 170, 175 Shadle, Christine H., 2 Shinohara, Shigeko, 26 Sidner, Candace L., 232 Siertsema, Berthe, 394 Sikuku, Justine, 248, 249, 254, 256 Silverman, Daniel, 29, 30 Simango, Silvester Ron, 166, 167, 169, 170, 172 Simon, Donald L., 399 Simons, Gary F., 9, 124, 368, 409 Skopeteas, Stavros, 116, 127, 128, 130, 131 Smith, Carlota S., 276 Smith, Jennifer L., 29 Smith, Peter W., 132 Speas, Peggy, 258 Spiegel, Wolfgang, 319 Starwalt, Coleen G. A., 16 Steele, Mary, 125, 126 Steriade, Donca, 30 Stewart, Osamuyimen T., 267, 268 Stowell, Tim, 212 Suchato, Atiwong, 30 Swadesh, Morris, 388 Sybesma, Rint, 294 Szabolcsi, Anna, 220 Tenny, Carol L., 258 Tesser, Abraham, 318 Théret-Kieschke, Régine, 10, 12 Thomas, Jacqueline M. C., 17 Thomason, Sarah G., 28, 349 Ting, Zeng, 6 Torrego, Esther, 265, 270 Travis, Lisa deMena, 276 Trilsbeek, Paul, 346 Trudell, Barbara, 308 Tuller, Laurice Anne, 114

Uwalaka, M. Angela, 108, 109 Van der Wal, Jenneke, 112 Van Koppen, Marjo, 248, 262 Van Putten, Saskia, 124, 126, 127, 129, 131 Van Urk, Coppe, 149–152, 154, 157, 159–162 Veenstra, Tonjes, 268 Vendelin, Inga, 27, 29 Vendler, Zeno, 167–169 Visser, Marianna, 48, 49, 174, 175 Vitale, Anthony, 227, 230 Voorhoeve, Jan, 183 Wagacha, Peter W., 77 Walczak, Adam, 319 Wasike, Aggrey, 248, 253 Watters, John R., 131 Webelhuth, Gert, 266, 276 Weed, Gretchen, 125, 126 Weenig, Mieneke W. H., 318 Weenink, David, 15 Were, Gideon, 395 Wetzels, W Leo, 26 Whaley, Lindsay J., 348, 349 Willem, Jules, 348 Williams, Edwin, 218 Williams, Ralph M., 383, 387, 389, 391, 393, 394, 396, 404, 405 Winkelmann, Kirsten, 126 Winkler, Susanne, 123 Winter, Yoad, 50 Wittenburg, Peter, 346 Woodbury, Anthony C., 348, 361 Wurmbrand, Susi, 223 Yip, Moira, 29, 30 Zeijlstra, Hedde, 216

Name index

Zeller, Jochen, 197, 198, 206 Zhang, Niina Ning, 276 Zimmermann, Malte, 65, 107, 112, 124, 131

# Descriptive and theoretical approaches to African linguistics

*Descriptive and theoretical approaches to African Linguistics* contains a selection of revised and peer-reviewed papers from the 49th Annual Conference on African Linguistics, held at Michigan State University in 2018. The contributions from both students and more senior scholars, based in North America, Africa and other parts of the world, provide a glimpse of the breadth and quality of current research in African linguistics from both descriptive and theoretical perspectives. Fields of interest range from phonetics, phonology, morphology, syntax, semantics to sociolinguistics, historical linguistics, discourse analysis, language documentation, computational linguistics and beyond. The articles reflect both the typological and genetic diversity of languages in Africa and the wide range of research areas covered by presenters at ACAL conferences.